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ABSTRACT 

■'  \ 

A  distributed  database  {DDB)  consists  of  redundant  copies  of  data  files 
geographically  distributed  on  a  computer  network.  This  thesis  develops 
a  performance  model  of  a  DDB.  This  model  can  be  used  to  compare  the 
performance  (i.e.  response  time,  utilization,  etc.)  of  different  concurrency 
control  algorithms. 

We  started  by  developing  a  network  of  queues  model  of  a  communication 
subnetwork.  We  have  originally  attempted  to  employ  Jackson's  Model 
but  have  concluded  that  Jackson's  Model  is  inadequate  for  our  purposes. 

The  Independent  Queues  Model  that  we  developed  in  this  thesis  makes  somewhat 
stronger  assumptions  than  Jackson's  Model,  but  has  more  flexibility  and 
approximates  better  a  real  communication  subnetwork. 

We  found  that  in  a  general  DDB,  concurrency  control  algorithms  could 
not  be  modelled  accurately  without  taking  into  consideration  the  particular 
query  processing  strategy  used.  We  have  therefore  developed  two  new  query 
processing  strategies:  the  MST  and  the  MDT  Algorithms.  These  two  algorithms 
are  easy  to  analyze  and  to  implement. 

We  next  modelled  the  competition  among  different  transactions  in  the 
DDB  for  the  services  of  the  database  management  system.  Prohabilitic  argu¬ 
ments  were  used  to  determine  the  probability  of  conflicts  between  different 
database  transactions  and  the  delay  due  to  conflicts.  / 
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CHAPTER  1 
INTRODUCTION 

This  report  describes  a  program  of  research  to  develop  a  perform¬ 
ance  model  of  a  distributed  database  (DDB) .  The  field  of  DDB  is 
relatively  new  and  growing  rapidly.  Numerous  algorithms  have  been 
proposed  for  data  retrieval,  update  synchronization  and  file  allocation. 

This  thesis  develops  a  tool  that  will  enable  one  to  compare  the  oerform- 
ance  (i.e.,  response  time,  throughput,  utilization,  etc.)  of  the 
different  algorithms,  and  to  propose  better,  more  efficient  solutions. 

The  approach  will  be  to  approximate  the  communication  subnetwork 
by  a  Network  of  Queues  Model.  Probabilistic  arguments  will  be  used  to 
specialize  the  model  to  accomodate  the  characteristics  of  a  DDB. 

1.1.  Definitions 

A  database  is  a  collection  of  operational  data  used  by  the  application 
systems  of  some  particular  enterprise.  The  Database  Management  System  (DMS) 
is  the  special  software  designed  to  provide  each  application  with  its  own 
view  of  the  common  data,  to  implement  operations  for  retrieval  and  update, 
and  to  resolve  conflicts  between  concurrent  users.  The  develooment  of 
computer  communication  networks  introduced  the  concept  of  the  distributed 
database  which  is  a  database  whose  physical  copies  of  data  (often  redundant) 
are  distributed  on  a  computer  network.  The  Distributed  Database  Management 
System  (DDMS)  permits  a  collection  of  data  relevant  to  a  particular 
enterprise  to  be  managed  on  a  network  of  geographically  dispersed  computers 
(computer  sites).  Fig.  1.1  shows  the  basic  architecture  of  a  DDB.  An 
arbitrary  network  of  computer  sites  is  connected  by  communication  links. 
Attached  to  each  computer  site  are  the  data  files,  sensors  and  terminals 
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Fiqure  1.1  Basic  Architecture  of  a  Distributed  Database 


through  which  users  can  access  the  data. 

1.2  Advantages  and  Disadvantages  of  DDB 

Some  enterprises,  such  as  military  Command,  Control  and  Communications 
systems  are  distributed  in  nature;  since  command  posts  and  sensory  gathering 
points  are  geographically  dispersed,  users  are  necessarily  dispersed.  For 
example,  the  Navy  has  remote  sensors  and  databases  distributed  all  over 
the  world.  Other  potential  users  are  airline  reservation  systems,  and 
electronic  fund  transfer  systems.  A  typical  user  will  be  an  enterprise 
which  maintains  operations  at  several  geographically  dispersed  sites,  and 
whose  activities  necessitate  inter-site  communication  of  data. 

Distributed  databases  also  offer  the  following  advantages  when 
compared  to  centralized  databases: 

(1)  Improved  throughput  -  the  availability  of  multiple  computers  means 
that  throughput  can  be  increased  via  parallel  processing. 

(2)  Sharing  -  geographically  dispersed  data  and  equipment  can  be  shared. 

(3)  Modular  expansion  -  distributed  database  systems  can  be  expanded  by 
the  addition  of  new  nodes  (database  sites)  to  the  network. 

For  distributed  databases  with  redundant  copies  of  data,  there  are 
additional  advantages: 

(1)  Improved  reliability/survivability  -  through  redundancy  of  data. 

(2)  Improved  response  time  -  by  storing  data  in  locations  where  it  is 
frequently  read.  Communication  delay  is  reduced  since  files  which 
are  under  heavy  demand  in  several  geographically  dispersed  locations 
can  be  stored  redundantly.  However,  more  redundant  copies  also  means 
increased  delay  during  writes. 

There  are  two  major  implementation  problems  associated  with  distributed 
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databases.  The  first  problem  is  that  communication  channels  between 
sites  are  often  very  slow  compared  to  the  storage  devices  at  the  local 
computer  sites.  For  example,  the  ARPANET  can  move  data  at  about  25  Kbps 
(kilobits/sec)  while  standard  disks  can  move  data  at  about  1  Mbps 
(meqabits/sec) ,  a  40-fold  increase  in  rate.  Besides,  networks  have 
relatively  long  access  times,  corresponding  to  the  propagation  delay  for 
one  message  to  go  from  one  computer  site  to  another.  (This  propagation 
delay  is  about  0.1  sec.  for  the  ARPANET.)  The  other  problem  is  that 
communication  channels  and  computer  sites  are  susceptible  to  failures, 
giving  rise  to  networks  that  may  have  constantly  changing  topologies. 

1 . 3  Key  Technical  Problems 

It  is  noted  that  some  of  the  problems  associated  with  distributed 
databases  are  the  same  as  those  for  the  non-distributed  (centralized) 
database  and  can  therefore  use  the  same  solutions.  Such  problems  include*: 
choosinq  a  good  data  model,  designing  a  schema,  etc.  However,  mainly 
because  of  the  two  implementation  problems  associated  with  the  distributed 
database,  the  following  problems  require  significantly  different  approaches 

(1)  data  retrieval  (or  query  processing)  -  a  query  accessing  data  stored 
at  different  sites  requires  that  data  must  be  moved  around  in  the 
network.  The  communication  delay,  and  hence  the  response  time,  depends 
strongly  on  the  choice  of  a  particular  data  storage  and  transfer 
strategy. 

(2)  update  synchronization  (  or  concurrency  control)  -  in  centralized 
databases,  locking  is  the  standard  method  used  to  maintain  consistency 


*  The  reader  is  referred  to  (DATE77]  for  a  definition  of  these  terms. 


amonq  redundant  copies  of  data.  The  distributed  nature  of  the  data 
in  DDB  means  that  settinq  locks  produces  long  messaqe  delays. 

(3)  reliability/survivability  -  the  network  introduces  new  components 
(communication  links,  computers)  where  failure  can  occur,  and  hence 
the  associated  problems  of  failure  detection  and  failure  recovery. 

(4)  physical  database  design  (file  allocation)  -  the  problem  of  how 
many  copies  of  each  data  file  to  maintain  and  where  to  locate  them. 
The  use  of  additional  redundant  copies  generally  means  reduced 
communication  delay  associated  with  data  retrieval.  Unfortunately, 
it  also  means  increased  delay  associated  with  update  synchronization 
The  problem  is  difficult  not  only  because  of  varying  file  request 
rates  due  to  the  users,  but  also  because  of  the  dynamic  nature  of 
the  network  topology.  Nodes  may  be  lost  due  to  computer  and  communi 
cation  link  failures  or  a  particular  node  moving  out  of  range  of  the 
communication  medium.  The  topology,  and  hence  the  associated  file 
allocation,  changes  aqain  when  new  nodes  are  added  to  the  network 
due  to  computer  and  link  recovery. 

1 . 4  Performance  Modelling 

1.4.1  Need  for  a  Performance  Model 

Our  literature  research  has  led  us  to  conclude  that  a  qood  model  of 
a  distributed  database  system  is  essential  to  research  in  this  area. 

Such  a  model  will  enable  us  to  compare  the  performance  of  the  different 
algorithms  proposed.  It  can  also  help  identify  the  decisive  factors  of 
system  performance  and  consequently  determine  the  direction  one  should 


take  in  further  research  on  improvement. 
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Such  models  should  be  flexible  enouqh  to  allow  for  the  testing 
and  comparison  of  alternative  synchronization  and  query  processing 
algorithms. 

One  possible  approach  to  performance  modeling  is  to  develop  a 
simulation  model.  However,  simulation  models  are  generally  exoensive. 
Besides,  a  simulation  model  only  gives  results  for  one  particular 
configuration  of  the  problem,  as  defined  by  a  specific  set  of  parameters, 
and  does  not  provide  as  much  insight  as  analytic  models  on  the  relation¬ 
ship  between  the  results  and  the  parameters.  We  shall  therefore  concen¬ 
trate  on  developing  an  analytic  model. 

1.4.2  Review  of  Research 

While  the  literature  on  ODB  abounds  in  concurrency  control  and 
query  processing  algorithms,  there  is  very  little  work  done  on  comparing 
the  performance  of  the  different  proposals.  Bernstein  and  Goodman 
[BG80]  analyzed  the  performance  of  principal  concurrency  control  methods 
in  qualitative  terms.  The  analysis  considers  four  cost  factors:  commu¬ 
nication  overhead,  local  processing  overhead,  transaction  restarts  and 
transaction  blocking.  The  assumption  is  that  the  dominant  cost  component 
is  number  of  messages.  Thus  distance  between  database  sites,  topology 
of  network  and  queueinq  effects  are  completely  ignored.  A  quantitative 
comparison  is  described  in  the  thesis  of  Garcia-Molinu  [GARC79] .  He 
compared  several  variants  of  the  centralized  lockinq  algorithm  with 
Thomas's  Distributed  Voting  Algorithm  [THOM79]  and  the  Rinq  Algorithm 
of  Ellis  [ELLT77] .  The  major  assumptions  are  (1)  a  fully  redundant 
database,  and  (2)  the  transmission  delay  between  each  pair  of  sites  is 
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constant.  The  first  assumption  requires  that  the  whole  database  is  fully 
replicated  at  each  node.  This  is  necessary  because  Garcia-Molina  did 
not  want  to  model  query  processing,  which  would  have  been  necessary  for  a 
qeneral  (not  fully  redundant) database  .  The  second  assumption  means 
that  the  topology,  messaqe  volume  and  queueinq  effects  of  the  communi- 

t 

cation  subnetwork  are  ignored. 

1 . 5  Outline  of  Thesis 

The  qoal  of  this  research  is  to  develop  a  performance  model  of  a 
DOB.  In  particular,  the  model  will  be  used  to  compare  the  performance 
tit  different  concurrency  control  algorithms.  In  Chapter  2  we  describe 
some  well-known  concurrency  control  algorithms.  In  chapter  3,  we 
develop  the  communication  subnetwork  model,  which  is  an  important  compon¬ 
ent  .  f  our  DDR  model,  described  in  Chapter  4.  Chapter  5  starts  with  a 
review  of  existing  query  processing  algorithms.  We  have  found  them  hard 
to  analyze  and  have  developed  two  new  query  processinq  alqorithms  that 
are  easy  to  m.ilvze.  In  Chapter  6,  we  introduce  the  conflict  model, 
which  allows  us  to  calculate  such  important  parameters  as  the  probability 
of  conflict  and  the  delay  due  to  conflicts.  Chapter  7  consists  of  four 
numerical  examples.  They  are  based  on  the  same  communication  subnetwork 
so  that  comparisons  can  be  made  between  different  concurrency  control 
algorithms.  We  conclude  in  Chapter  8  with  a  discussion  of  our  results 
and  suggestions  for  further  research. 


CHAPTER  2 


CONCURRENCY  CONTROL 

This  chapter  is  devoted  to  concurrency  control,  with  two  objectives: 
to  define  the  notion  of  correctness  of  a  concurrency  control  method  and 
to  describe  existing  concurrency  control  algorithms.  Although  the 
literature  abounds  in  concurrency  control  methods,  they  can  be  divided 
into  two  major  approaches,  namely,  two-phase  locking  and  timestamp 
ordering.  These  will  be  discussed  in  sections  2.3  and  2.4.  First  we 
shall  describe  the  concurrency  control  problem. 

2 . 1  The  Concurrency  Control  Problem 

In  a  database  system,  data  items  are  related  in  certain  ways. 

These  relationships  can  be  thought  of  as  assertions  about  the  data 
items.  Consider  two  data  items  x  and  y;  examples  of  assertions  are: 

<  =  y,  x  >  0.  The  database  system  is  said  to  be  consistent  when  its 
data  items  satisfy  all  its  assertions,  or  consistency  constraints. 

Assume  that  the  basic  unit  of  user  computation  is  the  transaction. 

A  transaction  executes  in  three  steps,  each  of  which  is  assumed  indivisible 

(1)  Read  -  the  transaction  reads  some  data  into  a  local  workspace. 

(2)  Computation  is  performed  on  the  workspace 

(3)  Write  -  some  values  in  the  workspace  are  written  back  into  the 
database. 

If  user  requests  are  not  coordinated,  the  execution  of  steps  in 
different  transactions  from  different  users  may  be  interleaved  in  any 
order.  Assume  that  each  transaction  preserves  the  consistency  of  the 
database  when  executed  alone.  The  execution  of  many  interleaved  trans- 
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actions  may  bring  a  consistent  database  into  an  inconsistent  one 
(see  [EGLT76] ,  [GRAY78] , [BG80] ) .  For  example,  suppose  the  present  value 
of  x  is  10  and  two  transactions  T]  and  T2  execute  the  following  program: 
increase  x  by  1.  If  Tj  and  T2  are  executed  one  after  another,  the  new 
value  of  x  is  12.  However,  if  they  are  executed  in  the  following  inter¬ 
leaved  order: 

(1)  Tj  reads  x  =  10 

(2)  T2  reads  x  =  10 

(3)  Tj  increases  x  by  1,  x  =  11 

(4)  T2  increases  x  by  1,  x  =  11 

then  the  new  value  of  x  is  11  which  is  incorrect.  Other  examples  of  con¬ 
currency  anamolies  can  be  found  in  [EGLT76] ,  [GRAY78] .  It  is  the  task 
of  the  concurrency  control  mechanism  of  the  database  system  to  safeguard 
database  consistencv. 

2.2  Serializability 

The  notion  of  correctness  in  this  thesis  is  that  of  serializability. 

A  set  of  transactions  executes  serially  if  each  transaction  executes  its 

write  step  before  the  next  cransaction  executes  its  read  step.  That  is, 

the  transactions  are  not  interleaved.  A  serial  execution  of  transactions 

by  assumption 

preserves  database  consistency  since  each  t ransactior^maps  the  original 
consistent  database  to  another  consistent  database.  A  sequence  of 
interleaved  transactions  is  serializable  if  it  produces  the  same  final  state 
as  a  serial  execution  of  those  same  transactions.  Since  a  serial  execu¬ 
tion  preserves  consistency,  a  serializable  execution  also  preserves 


consistency . 
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Ser iulizability  is  an  appealing  correctness  criteria,  since  it  can 
be  guaranteed  without  having  the  database  system  know  the  precise 
computation  performed  by  each  transaction.  It  needs  only  know  the  portions 
of  the  database  read  and  written  by  each  transactions.  (see,  e.g.[BSW79], 

[ KP 79 ] ) .  Other  criteria  of  correctness  bave  been  proposed  (see,  e.g. 
[GLPT75]),  but  serializability  is  the  most  popular  among  researchers 
in  the  area.  (  [EGLT76] , [RSL78] , [BSW79] ,  etc.) 

2 . 3  Two-Phase  Locking 

2.3.1  Specification 

In  two-phase  locking,  every  data  item  has  a  lock  associated  with  it. 

At  any  time,  no  more  than  one  transaction  can  hold  the  lock  on  a  data 
item.  If  a  transaction  requests  locks  on  all  needed  data  items  before 
startinn  and  releases  the  locks  after  completion , serializability  is 
preserved.  "Two-phase"  refers  to  the  requirement  that  locks  will  be 
obtained  in  two  phases,  a  qrowinq  phase  and  a  shrinking  phase.  It  is 
important  that  once  a  transaction  has  released  a  lock,  it  will  not  obtain 
any  more  locks.  The  point  in  time  at  the  end  of  the  growinq  phase  is 
called  the  locked  point [BSW79] .  The  serializability  of  two-phase  locking 
is  induced  by  the  locked  points.  Several  authors  have  proved  that 
two-phase  lockinq  guarantees  ser i a  1 i zabi 1 i ty ( [BSW79] , [EGLT76] ) . 

One  major  drawback  of  two-phase  locking  algorithms  is  the  possibility 
of  deadlocks.  Deadlocks  occur  when  two  transactions  are  waiting  for 
each  other  to  release  locks  on  some  data  items,  (see  section  2.4).  There 
are  different  versions  of  two-phase  lockino  alqorithms.  The  two  major 


methods  ate  described  in  the  next  sections.  The  solution  of  the  deadlock 
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proLlem  is  quite  different  for  different  methods. 

2.3.2  Distributed  Two-Phase  Locking 

Locks  on  data  items  are  managed  bv  a  scheduler  or  lock  manager.  In 
the  basic  implementation,  or  Distributed  Two-Phase  Locking,  the  schedulers 
are  distributed  with  the  data.  At  each  database  site,  there  is  a  schedu¬ 
ler,  responsible  for  the  data  items  at  that  site. 

A  transaction  readinq  a  data  item  X  can  send  a  read  request  to  any 
site  containing  a  copy  of  X  and  request  the  lock  on  X  from  the  scheduler 
at  that  site.  On  the  other  hand,  a  transaction  updating  a  data  item  has 
to  send  write  messages  to  all  sites  having  a  copy  of  X  and  request 
locks  from  the  schedulers  at  all  of  those  sites*. 

This  approach  is  more  efficient  than  the  Centralized  Two-Phase 
Lockinq  Algorithm  (described  in  the  next  section)  in  that  there  is  no 
central  node  to  serve  as  a  potential  bottleneck  since  the  schedulers  are 
distributed  with  the  data.  However,  the  distribution  of  the  schedulers 
means  that  deadlock  detection  is  infeasible.  For  example,  transaction  A 
may  be  waiting  for  Transaction  B  to  release  its  locks  at  node  1,  while 
transaction  B  may  be  waiting  for  Transaction  A  at  node  2.  There  are 
no  deadlocks  locally  at  nodes  1  and  2.  However,  the  system  has  a  deadlock 
and  will  not  be  able  to  detect  it  unless  locking  information  is  communi¬ 
cated  from  node  to  node.  This  communication  will  put  a  heavy  burden  on 
the  communication  subnetwork.  Therefore  deadlock  detection  is  infeasible, 
and  more  complex  solutions  to  the  deadlock  problem  must  be  used. 


♦One  way  of  handling  redundant  data  is  to  assume  that  redundant  copies  of 
a  data  item  X  are  distinct  data  items  X^,  X^,...  X^.  Reads  can  be  processed 
at  any  of  the  copies,  while  writes  must  be  implemented  at  all  copies. 
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2.3.3  Centralized  Two-Phase  Locking 

There  are  many  variations  on  the  basic  implementation,  one  of 
which  is  Centralized  Locking [AD76] .  In  this  case,  the  scheduler  is  located 
at  one  site,  the  central  node.  Before  accessing  data  at  any  site,  locks 
must  be  obtained  at  the  Central  Node. 

This  approach  has  the  advantage  of  central  control  and  relatively 
easy  deadlock  resolution  (which  will  be  discussed  in  the  next  section) . 
However,  the  creation  of  the  central  node  means  that  all  transactions 
have  to  go  through  this  node,  thereby  producing  a  potential  bottleneck. 
Besides,  when  the  central  node  fails,  the  system  cannot  function  anymore. 
Various  remedies  are  proposed,  including  the  use  of  multiple  central 
nodes  to  solve  the  bottleneck  p  oblem  and  the  use  of  redundant  central 
nodes  to  solve  the  reliability  problem. 

2.4  Deadlocks 

In  two-phase  locking,  a  transaction  is  asked  to  wait  when  the  data 
item  it  accesses  is  locked  by  some  other  transactions.  If  this  waiting 
is  uncontrolled,  a  deadlock  may  result.  For  example,  if  transaction  A 
is  waiting  for  transaction  B  to  release  its  locks  while  transaction  B 
is  waitina  for  transaction  A  to  release  its  locks,  then  neither  transaction 
can  complete  because  they  cannot  get  all  the  data  items  required,  neither 
transaction  releases  its  locks  and  there  is  a  deadlock.  A  deadlock  is 
best  illustrated  by  a  waits-for  graph [KC74],  which  is  constructed  as 
follows:  For  all  pairs  of  transactions  A  and  B,  an  edge  is  drawn  from 
transaction  A  to  transaction  B  if  A  is  waiting  for  a  lock  currently  owned 
by  B.  There  will  be  a  deadlock  in  the  system  iff  the  waits-for  graph 
contains  a  cycle.  To  solve  the  deadlock  problem,  we  can  employ  either 


deadlock  detection  or  deadlock  prevention. 
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2.4.1  Deadlock  Detection 

In  deadlock  detection,  transactions  are  allowed  to  wait  for  locked 
items  in  an  uncontrolled  manner.  Periodically,  a  deadlock  detector 
constructs  the  waits-for  graph  of  the  system  and  determines  whether  there 
are  any  deadlocks  by  searching  for  cycles  in  that  graph.  If  a  cycle  is 
found,  one  of  the  transactions  in  the  cycle  is  restarted.  Hopefully, 
enough  information  will  be  available  to  allow  one  to  choose  the 
eye le-breaking  transaction  intelligently,  so  that  the  amount  of  resources 
wasted  is  minimum.  Unless  all  locking  information  is  available  at  one 
node,  as  in  the  Centralized  Locking  Algorithm,  deadlock  detection  is  not 
practical  because  of  the  communication  of  locking  information  that  is 
necessary. 

2.4.2  Deadlock  Prevention 

In  deadlock  prevention,  transactions  are  allowed  to  wait  for  locked 
items  in  a  controlled  manner  to  eliminate  the  possibilitv  of  deadlocks. 
Ordered  Queues 

One  way  to  prevent  deadlocks  is  to  require  that  all  transactions 

request  locks  in  some  universally  specified  order,  i.e.  wait  for  X  first, 

then  Y,  etc.  This  has  the  special  property  that  transactions  never  have 
to  be  restarted. 

Prioritized  Transactions 

Another  method  to  prevent  deadlocks  is  to  assiqn  priority  to  each 
transaction  and  to  require  that  when  transaction  conflict , only  higher 
priority  transactions  can  wait  for  lower  priority  transactions ,  or  vice 


versa.  Consider  the  waits-for  graph  of  such  a  system.  Every  edge  in  the 


-23- 


graph  is  in  priority  order,  i.e.  Ti-*-  T2  -*■  ...  -*•  T  .  A  cycle  is  a  path 
from  a  node  to  itself  and  since  it  is  impossible  to  have  a  transaction 
which  has  lower  priority  than  itself,  no  cycles  can  occur,  and  the 
system  is  deadlock  free. 

One  convenient  way  to  assign  priority  is  to  use  timestamps,  the 
older  the  timestamp  the  higher  the  priority.  Two  methods  are  proposed 
by  Rosenkrantz,  et.  al. [RSL-78] .  Suppose  T^  tries  to  wait  for  T.: 

(1)  Wait-die  System 

If  T^  has  higher  priority,  it  is  allowed  to  wait;  if  T^  has  lower 
priority,  it  is  restarted. 

When  T.  is  restarted,  it  releases  all  resources  that  it  has 
1 

previously  locked.  The  database  management  system  then  resubmits 
lock  requests  for  T  .  To  prevent  cyclic  restarts,  i.e.  being 
restarted  over  and  over  again,  I\  retains  its  original  timestamp. 

(2)  Wound-wait  System 

If  T.  has  higher  priority,  T_.  is  wounded  a. id  T^  waits  until  the 
wound  takes  effect;  if  T^  has  lower  priority,  it  is  allowed  to  wait. 
When  a  transaction  T^  is  wounded,  it  sends  wound  messages  to 
all  sites  where  T_.  is  being  processed.  If  the  transaction  has 
not  initiated  termination,  i.e.  the  write  phase  of  the  two-phase 
commit*,  then  the  transaction  is  restarted.  Otherwise,  the  trans¬ 
action  is  allowed  to  finish  because  in  this  case  there  is  no  danger 
of  deadlock. 


*  See  section  4.3  for  a  description  of  the  two-phase  commit  algorithm. 
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In  the  wait-die  system,  younger  transactions  are  restarted  when  they 
conflict  with  older  transactions,  while  in  the  wound-wait  system,  the  youmor 
transactions  are  allowed  to  wait.  Younger  transactions  arrive  into  the 
system  later  than  older  transactions,  and  hence  are  likely  to  arrive  at  a 
given  node  later  too.  Therefore,  the  majority  of  conflicts  in  the  system 
is  of  the  type:  younger  transactions  waiting  for  older  ones.  Hence,  under 
wait -die,  most  transactions  will  be  restarted,  while  under  wound-wait,  most 
transaction.--  are  allowed  to  wait.  Since  waiting  presumable  consumes  less 
!  ••  :r  :os  than  restarting,  wound-wait  seems  more  efficient  than  wait-die. 

2  .  Time . :tanp  Ordering 

2.5.1  Specification 

In  timestam:  ordering,  each  transaction  is  assinged  a  timestamp. 

This  timestamp  is  quranteed  to  be  globally  unique  by  ensuring  that  no 
new  tinestam;  will  be  assigned  at  the  same  site  until  the  clock  has  ticked, 
and  by  appending  the  site  number  to  the  low  order  bits  of  the  timestamp. 

T'.o  timestamp  is  attached  to  all  read  and  write  operations  issued  on  behalf 
of  that  transaction.  Each  data  item  lias  a  timestamp  equal  to  that  of  the 
last  write  operation.  The  database  management  system  is  required  to  process 
the  transactions  in  timestamp  order.  This  is  accomplished  by'  transaction 
r-’jtc.ris  and  transaction  blocking.  Suppose  a  transaction  sends  a  RR 
('-•••quest  to  read  message)  with  timestamp  TS  to  site  u  to  read  data 
item  X  'with  timestamp  TS  ) .  If  TS  <  TS0,  the  transaction  must  be  restarted 
nr;.l  resubmitted  with  a  new  (and  bigger)  timestamp.  If  TS^  <  TS^ ,  the  RR 
must  wait  until  ail  write  operations  with  timestamps  less  than  TS^  have  been 
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•  'recessed .  Therefore,  the  RR  must  wait  for  the  arrival  of  all  write  operation 
(.with  timestamp  TS^)  such  that  TS^  >  TS^.  Since  this  write  operation  may 
never  arrive,  the  wait  may  be  infinite.  To  alleviate  this  problem,  the 
database  system  has  to  generate  periodically  null-write  messages,  i.e.  write 

messages  with  only  a  timestamp  but  no  new  value  for  the  data  item,  from  each 
site . 

It  can  be  shown  that  timestamp  ordering  guarantees  serializability 
[ BG80]  and  the  serialization  order  is  the  timestamp  order. 

2.5.2  SDD-1 

SDD-1  ( [BG80] ,  [BGR80] )  is  a  specialized  version  of  a  timestamp 
ordering  (T/0)  synchronization  algorithm.  The  basic  T/0  algorithm  seeks 
to  guarantee  serializability  by  re-ordering  transactions  so  that  they 
will  be  processed  in  timestamp  order.  The  algorithm  is  applied  to  all 
transactions,  regardless  of  whether  conflicts  do  indeed  exist.  SDD-1  tries 
to  improve  on  the  basic  T/0  algorithm  by  incorporating  two  new  features: 

(1)  transaction  classes  -  a  transaction  class  is  a  fined  by  its  readset 
and  its  writesct.  Conflicts  between  transactions  can  be  determined 
by  conflicts  between  their  respective  classes. 

(2)  conflict  graph  -  conflicts  between  classes  can  be  analyzed  during 
database  design.  It  is  determined  that  four  protocols  (Pi ,  p2  ,  p3  ,  P4) 
with  varying  costs  of  synchronization,  are  sufficient  to  guarantee 
serializability. 

In  defining  transaction  classes,  transactions  that  arrive  at  different 
TM’s  will  be  designated  as  different  transaction  classes,  even  if  their  readsets 
and  wntesets  are  the  same.  Moreover,  SDD-1  assumes  that  messages  between 
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each  pair  of  sites  will  be  sent  and  arrive  at  their  destinations  in  timestamp 
order. 

The  basic  architecture  of  SDD-1  is  shown  in  Fig.  2.1.  It  consists 
of  Transaction  Modules  (TM's)  and  Data  Modules  (DM's).  The  TM  supervises 
transaction  executions  while  the  DM  manages  the  data  at  the  local  site. 

TM's  and  DM's  may  be  located  together  at  the  same  computer  site  or  reside 
at  different  sites. 

The  algorithm  is  best  illustrated  by  an  example  (Fig.  2.2).  Suppose 

a  transaction  i  arrives  at  TM  .  To  process  this  transaction,  TM  follows 

a  a 

the  following  steps: 

(1)  Look  in  class  table  and  find  the  class  of  i,  say  i. 

(2)  Determine  the  conflicting  transaction  classes  j,  k,  etc.,  and  the 
required  synchronization  protocols. 

(3)  Query  Processing  :  TM^  devises  a  query  processing  strategy  to  read 

data  at  the  DM's  and  produce  the  result  at  TM  .  Suppose  the  strategy 

requires  reading  a  data  item  at  DM  ,  then  TM  sends  a  RR  (request  to 

ct  a 

read)  message  to  DM^  with  a  Read  Condition  <TS^,  (j,  k,  ...)>.  The 

read  condition  identifies  the  conflicting  classes  j,  k,  etc.  and  the 

I 

timestamp  of  i  (TS^ ) .  When  the  read  message  arrives  at  DM^ ,  the 
synchronization  protocol  dictates  that  it  must  be  processed  after  all 
write  messages  from  conflicting  classes  (i.e.  j,  k,  etc.)  with  timestamps 
less  than  TS^,  but  before  those  write  messages  with  timestamps  greater 
than  TS ^ .  Hence,  if  the  read  message  (with  timestamp  TSJ  arrives  at 
DM  later  than  a  write  message  (with  timestamp  TS^)  generated  from  anothi  r 
TM,  where  TS  >  TS . ,  then  the  read  message  cannot  satisfy  the  required 
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Figure  2.1  Basic  Architecture  of  SDD-1 
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synchroni zat it n  protocol  and  must  be  rejected  and  transaction  i  restarted 
by  TM  with  a  later  timestamp.  Note  that  any  other  read  messages  on 
behalf  of  transaction  i  have  to  be  resubmitted  with  this  new  timestamp. 

The  scenario  described  above,  namely  messages  arriving  at  DM  in  reverse 
order  to  their  timestamps,  is  possible  because  of  different  transmission 
delays  on  different  communication  channels.  If  the  read  message  is  not 
rejected,  then  it  must  wait  until  its  read  condition  is  satisfied,  i.e. 
it  must  wait  for  the  arrival  of  a  write  message  or  a  nullwrite  message 
with  timestamp  greater  than  TS^  from  all  conflicting  transaction  classes. 
These  write  messages  must  then  wait  for  the  read  to  take  place  before 
they  can  be  implemented. 

(4)  Write:  TM^  sends  RW  (request  to  write)  messages  to  all  DM's  containing 
data  items  that  have  to  be  updated.  A  RW  consists  of  the  new  value  of 
the  data  item  and  a  timestamp  equal  to  that  of  transaction  i.  All  write 
messages  can  be  implemented  as  soon  as  they  are  received  at  the  DM's.  The 
rule  is  as  follows:  If  the  timestamp  of  the  RW  is  smaller  than  that  of 
the  stored  data  item  it  tries  to  write,  the  RW  is  ignored.  Otherwise,  the 
value  and  timestamp  of  the  stored  data  item  will  be  updated,  after  pending 
reads,  if  any, are  processed.  This  is  known  as  the  Thomas  Write  P.vle 
[THOM79] .  Nullwrites  are  treated  differently  from  RW's  in  that  they  do 
not  update  the  timestamp  of  the  data  item. 

2.6  Other  Concurrency  Control  Algorithms 

Our  discussion  of  concurrency  control  algorithms  will  not  be  complete 
wihout  at  least  a  brief  descripiton  of  Thomas'  Majority  Consensus  Algorithm 
(THOM79)  and  Ellis'  Ring  Algorithm  [ELLI77] .  These  two  algorithms  are 
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amonq  the  earliest  alqorithms  proposed  for  concurrency  control,  However, 
they  are  devised  for  fully  redundant  databases  and  can  be  shown  to  be 
rather  inefficient  ([BG80]).  Therefore,  we  shall  not  attempt  to  model 
them. 

2.6.1  Thomas'  Majority  Consensus  Algorithm 

Thomas  devised  an  alqorithm  for  fully  redundant  databases.  Reads  are 
processed  at  the  site  where  the  transaction  originates.  A  proposed  write,  or 
update  is  passed  from  site  to  site.  Each  site  will  vote  yes,  no  or 
pass  on  the  update.  When  a  majority  of  sites  have  voted  yes,  the  update 
is  installed.  Voting  has  therefore  effectively  replaced  locking.  The 
locked  point,  using  two-phase  lock  terminology,  is  reached  when  the  site 
originating  the  update  has  received  a  majority  of  yes  votes. 

This  algorithm  is  impractical  since  it  assumes  a  fully  redundant 
database.  Even  for  fully  redundant  databases,  it  is  inefficient  since 
conflicts  between  transactions  must  be  solved  by  restarts  in  contrast 
to  locking  and  timestamp  ordering  algorithms  which  can  also  resolve 
conflicts  by  transaction  blocking  (i.e.  waiting  for  locks  to  be  released 
or  read  conditions  to  be  satisfied.) 

2.6.2  Ellis'  Ring  Alqorithm 

In  Ellis'  Algorithm,  the  database  is  fully  redundant  and  the  commu¬ 
nication  subnetwork  is  configured  as  a  ring.  To  effect  an  update,  a 
transaction  moves  successively  from  site  to  site  on  the  ring,  obtaininq 
a  lock  on  the  entire  database  at  each  site.  This  means  that  all  transactions 
must  be  executed  serially,  and  no  parallel  processing  is  possible.  Hence, 


the  alqorithm  is  inefficient. 
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CHAPTER  3 

COMMUNICATION  SUBNETWORK  MODEL 

i ncc  the  distributed  database  is  manaqed  on  a  communication  network, 
and  we  an  think  of  transactions  (i.e.,  queries  and  updates)  as  competing 
imon  i  themselves  for  the  available  resources  of  the  network,  it  seems 
natural  to  model  the  communication  network  by  a  network  of  queues.  We 
'•.v  attempted  to  employ  Jackson's  Oueueinq  Networks  since,  provided 

'a  assumptions  are  satisfied,  the  resulting  model  is  very 

Tower fu'.  However ,  we  have  found  that  Jackson’s  model  is  inadequate 
for  our  purposes.  In  this  chapter,  we  introduce  Jackson's  model, 

’escribe  how  it  can  be  used  to  model  a  communication  subnetwork  and 
■cant  out  its  inadequacies.  We  then  propose  a  new  "independent  queues" 
model  of  a  communication  subnetwork.  This  model  is  adopted  in  this 
thesis  for  the  analysis  of  communication  delay  and  is  described  in 
sec t ion  3.3 

v . 1  J ackson ' s  Network  o r  Queues 

3.1. 1  Bus ic  Model 

The  model  introduced  in  Jackson [JACK57]  is  an  arbitrary  network 

of  queues,  consist ino  of  N  nodes  where  the  ith  node  consists  of  m. 

i 

exponential  servers,  a  f irst-come-first-served  queue  discipline  and 

unlimited  queueing  capacity.  The  external  input  stream  to  node  i  is 

Poisson  with  rate  y.,  and  these  external  input  streams  are  assumed 
i 

to  be  independent.  The  service  times  at  node  i  are  independent  and 

have  a  common  exponential  distribution  with  parameter  u  ,  and  are 

i 

also  independent  of  the  customer  arrivals  at  node  i.  A  customer  leavina 
node  i  is  immediately  and  independently  routed  to  node  i  with  probability 
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IX 

p.  ,  and  he  departs  the  system  with  Probability  ci.=l-T  p  .. 
ID  i  j=1  AD 


If  we  denote 


the  total  arrival  rate  (includinq  external  and  internal  arrivals) 
i  by  A  ,  we  see  that  the  following  equations  must  be  satisfied: 

N 


A  . 

l 


1  A  .p 

j-1  3 
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i  =  1,2,  .  .  .N 


to  node 


(3.1) 


Let  the  state  variables  for  this  N-node  system  consist  of  the 

vector  (k, ,k_ , . . .k  ) ,  where  k.  is  the  number  of  customers  (includina 
1  2  N  l 

those  in  service)  at  the  ith  node,  and  the  equilibrium  probability 

associated  with  this  state  be  denoted  by  p(k,,k_,...k  ).  Similarly, 

let  the  marginal  probability  of  findinq  k^  customers  at  the  ith  node 

be  p.(k.).  Jackson  showed  that,  orovided  the  utilization  is  less  than 
li 

one  at  each  node,  i.e.. 


Pi  =  <  1  i  =  1,2, ...N 

then  the  joint  distribution  for  all  nodes  factor  into  the  product  of 
each  of  the  marginal  distributions,  i.e., 

P(kl'k2'---y  =  Pl(kl)p2(k2)"-pN(kN) 
and  p^(k^)  is  aiven  as  the  solution  of  the  classical  M/M/m  system, 

k  . 

p .  (0)  fm . p  .  )  1/k.!  0<k.  <  m . 

I  ll  1  —1—1 

.  (3.2) 

k .  m , 

p.(0)(p.)  1(m.)  1  /  m .  !  m.  <  k.  <  00 

ill  l  li. 

This  says  that  whenever  an  equilibrium  condition  exists,  each  node 
in  the  network  behaves  as  if  it  were  an  independent  M/M/m  queue  with 
Poisson  input,  although  in  qeneral  the  total  input  (includinq  internal 
transfers)  at  each  node  is  not  Poisson. 
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' .  1  .  .  1  lem  I  .1  1  Ml  ide  1 

-  . .!.(  ksiii  r  list  i  ’iib  1  i  shed  hiv  basi-  model  m  1957 ,  there  have 

:  .mot  •  ••••;  extensions  ptoposed.  The  most  ‘jeneral  results  have  been 

!•  i.:.e.‘  by  basket  t  et  a  1  .  [  1*CMP75]  .The  1 1  model  consists  of  N  nodes  and 

!.iS'"S  >  t  customers.  Customers  travel  thiouuh  the  network  accord  i  n<r 

f>  ft  limit  ion  i  tobab  i  1  i  t  i  es 

:  ^  p  (next  node  is  i,  next  .  lass  is  s1 

current  nixie  is  i  and  current  istomer  class  is  r) 

dlU'le  1,  i  -  1,...N;  t,  s  r  1,...!.. 

The  system  un  be  closed  for  certain  classes  of  customers  and  open 

:  'f  others.  If  the  system  is  closed  for  customers  of  class  r,  then  the 

number  of  such  customers  within  the  system  is  fixed  at  a  constant 

number  k(r).  In  an  open  network,  the  total  arrival  rate  to  the  network 

is  Poisson  with  mean  rate  dependent  on  the  total  number  of  customers  in 

the  network.  An  arrival  enters  node  j  in  class  r  with  a  fixed  probability 

(state  i tide,  endent )  of  i.  . 

lr 

There  are  also  four  kinds  of  service  nodes: 

Tyt  e  1  Mode  :  The  service  discipline  is  first -come-f i rst -served  (FVFS) 

with  a  sinqle  server;  ail  customer.;  have  the  same,  neqative  exponential 

rate 

service  time  distribution  with^M(i)  where  i  is  the  number  of  ustomers 
at  the  node. 

Type  2  Node;  There  is  a  simile  server  at  the  node,  the  service  discipline 
rs  processor  sharinq  (i.e.  when  there  are  n  customers  m  the  node,  each 
is  receivin'.)  service  at  a  rate  of  1  'n  timer,  total  service  fate)  ,  and  each 
■lass  of  customer  may  have  a  distinct  service  time  distribution.  The 
service  time  distributions  have  rational  Laplace  transforms. 
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Type  3  Node:  The  number  of  servers  in  the  service  center  is  greater 
than  or  equal  to  the  maximum  number  of  customers,  and  each  class  of 
customer  may  have  a  distinct  service  time  distribution.  The  service 
time  distribution  have  rational  Laplace  transforms. 

Type  4  Node:  The  queueing  discipline  is  preemptive-resume  last-come- 
f irst-served (LCFS)  with  a  single  server.  Each  customer  class  may  have 
a  distinct  service  time  distribution  which  must  have  a  rational  Laplace 
t ransturm. 

Note  that  any  distribution  can  be  approximated  by  a  distribution 
with  rational  Laplace  transforms. (KLEI75] . 

The  traffic  equations  for  this  general  Jackson  Model  are: 

i  =  i , .  .  n 

e  =  a  +  -  e.  p  -  (3.3) 

’ s  is  -  if  l ,  r ; l ,s  , 

i ,  r  s  =  1 ,  ...  I, 

where  e  is  the  arrival  rate  of  class  s  customers  to  the  i th  node. 

The  states  •:  the  system  involve  a  rather  complex  description 

:  i  v  i  :•  i  it  l-  ns  f  -ust. uners,  their  lass  and  their  stage  of  attained 

a  i  .  [  f*'  ’Mi  ' r- ]  i  roved  that  the  equilibrium  irobabilit 


x 

N 


) 


nsists  wt  -on*!  orient  s  x  iwhi  li  a: 
i  1  i  »  l  >r. s  t  r  i  -a  :  1  i  ru:  at  ri«  dr  i  ; 


■  Til. !  1  1  . 


I  .m).i 


•he 


i.  •  n 


i  *  i  i  :  u  ■  ■  ’  <  -r  s  >  1 1 . 
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ot  class  r  in  service  center  i.  Let  n.  be  the  total  number  of  customers 

l 

at  service  center  i  and  let  1/u .  be  the  mean  service  time  of  a  class  r 

lr 

customer  at  service  center  i,  the  equilibrium  probabilities  are  qiven 


by 


p(s  =  (yi,y2....yN>>  =  cd(S)  q1(y1)...qN(y(J) 

L  n .  n . 

where  -i .  ( v .  )  -  n.!{  ”  ( 3  'n .  !)(e.  )  lr)(l/p.)  1 

l  ' i  l  ,  lr  lr  i 

r=  1 

L  n . 

U  ly.  )  -  n  !H  (In.  !)  (e  /y .  )  lr 

I  '  i  i  ,  lr  i  r  1  r 

r=  1 

L  n . 

q . ( v  '  -  3  (1 /n .  ! ) (e .  /u .  )  Xr 

II  ,  it  l r  l r 

r  =  1 


(3.4) 

for  Type  1  nodes, 

fur  Type  2  or  4  nodes, 

for  Type  3  nodes. 


This  result  is  remarkable  not  only  because  of  the  product  form 
exhibited,  but  also  because  of  t  ie  fact  that  uener.il  service  time 
d i st r i but  ion  twi th  rational  Laplace  transforms)  for  the  different  classes 
•  e  •••istomers  yield  the  same  result  as  exponent .  ia  1  service  time  distributions, 
since  illy  the  mean  service  rate  i ;  included  in  the  results. (See  Equation 
.  4  i  "• 


4  the  e 

■l-ai  1  ihi 

.  AV 

i  l  st  i 

.  ib 

-It  \: 

■n  for  the  numlie 

‘he  s 

i: 

t  h<- 

•1 1  St  1 

r  it' 

•it 

•n  in  an  m  M  1 

•  ,  »  lie 

1  r  r»*st  > 

\\ 

t.  :  i ' 

I  c  t 

t  iU 

it  mi,  is  that  •  f 

'TV.  1  it  e  I,  .  .1  l,e*  w  1  k 
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3.2.1  Model  Description 

Suppose  we  want  to  model  a  communication  subnetwork  by  Jackson's 

Basic  Model.  The  modelinq  process  can  best  be  illustrated  by  an  example. 

Consider  the  simple  network  shown  in  Figure  3.1,  which  contains 

three  computer  sites  connected  by  directed  communication  links.  Messages 

(which  can  be  update,  query,  file  transfer,  or  acknowledgement  messages) 

enter  the  different  nodes  destined  for  other  nodes.  The  quantity  is 

the  rate  of  arrival  of  messages  at  node  i  destined  for  node  j.  C. .  is 

il 

the  capacity  of  the  channel  from  i  to  j .  If  we  assume  that  the  message 

arrival  is  Poisson,  that  the  message  lengths  are  exponentially  distributed 

with  mean  1  u.  and  that  the  independence  assumption  holds,  i.e.,  each 

time  a  message  enters  a  new  channel,  a  new  length  is  chosen  independent lv 

from  the  exponential  pdf,  we  can  then  model  this  network  by  the  basic 

Jackson's  model.  Each  channel  corresponds  to  a  node  in  Jackson's  model*. 

channel  serves 

It  is  '  art  her  issumed  that  each^messaues  (customers)  in  a  FCFS  queue 

(is-:;  .me  with  jr,  limited  aieuein.t  capacity  and  an  exponential  service 

time  1  ;.r  where  J  is  the  r.i raoitr  of  the  channel  .  A  message  leaving 
i  i 

.  i  i .«  :  mme  i !  Te!v  routed  to  channel  •  with  orobabi  1  it  v  p.  ,  and 

'  1 1 

b"  .,}  t  s  system  v :  t  h  •  rob  ibi  1  1 1  v  a  .  Figure  1.2  shows  this  simple 

'  i 

three-  .nt  Ter-.., re  «  x  uni  le  again  'With  the  taths  taken  by  the  different 
>•<  e  ,  ie:  to  t  :  it  ed  !v  h  •  ’  ea  1  ir.es  )  and  its  e  tuiva  lent  Jackson’s  model 


ivip.ru*  1  both  ,  ,  ,  and  t  j  messages  use  this 


:.  tu* .  ,  t-T  r..  «•  » :n*y  r  e.i  h  *omt  Ter  site  .  ,  the  >  ,  messages,  correspond 


(■■•so  life  comma  i  utii't,  copters  v;  *b 
i  t;\!r  i  i.'i'i  i  ■  »  ■  •  :  re*  at  ion  .  We  sh  ■.  1 
•  i  ;  i*<-:  t  tie  s<  >■■'..*■.  Ip  J  so  k  son  ’  s  m.  ..to 
T  i.  b,  .  •  i  *  he  hannols. 
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Figure  3.1  A  Three-Node  Communication  Network 
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incj  to  a  fraction  Yi2//^Y12  +  Y13^  of  t*1e  tota^  messaqes  passinq  throuqh 
channel  ,  leaves  the  system  while  the  rest  is  transferred  to  channel  C  . 

Similarly,  we  can  find  the  routing  probabilities  for  the  other  channels. 

3.2.2  Assumptions 

It  is  noted  that  in  order  to  use  the  basic  Jackson's  model,  it  is 
necessary  to  make  the  followinq  assumptions,  which  approximate  reality 
with  varying  degrees  of  success: 

(1)  Poisson  messaqe  arrivals 

(2)  One  class  of  messaqe  with  exponentially  distributed  message  lengths 

(3)  Stationarity  and  Independence  of  the  Stochastic  Processes  (1)  and 

(2) 

(4)  First  -come- f irst-serve  queueinq  discipline 

(5)  Independence  of  service  times  at  different  communication  channels. 

(6)  Random  routinq  of  messaaes,  irrespective  of  messaae  destinations. 

(7)  Unlimited  queueinq  capacity  at  each  node. 

(8)  Noiseless  channels  and  perfectly  reliable  nodes  and  channels. 

How  well  does  Jackson's  model  approximate  the  communication  subnetwork 
of  interest  ? 

The  author  is  not  aware  of  any  empirical  studies  to  justify  assumptions 
(1)  to  (3)  for  a  communication  network.  However,  certain  data  obtained 
by  Molina  FMOLI27]  for  telephone  traffic  correspond  well  to  these  assumptions 
Assumption  (2)  is  unrealistic.  Since  we  are  interested  in  the 
communication  subnetwork  of  a  DDB,  there  will  be  different  classes  of 

messages  with  different  lengths.  At  the  very  least,  we  would  like  to 

file  transfers  and  short  messaqes  such  as 
distinguish  between  long  messaqes  such  as^lock  requests  and  lock  qrant 

niess.i'ies.  The  basic  Jackson’s  model  only  allows  one  class  of  messaqe. 
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The  qeneral  Jackson's  model  does  allow  different  classes  of  messages  for 
queueinq  disciplines  other  than  FCFS.  However,  in  a  store-and-forward 
network,  the  queueinq  discipline  is  best  described  as  FCFS. 

Assumption  (4)  is  relaxed  in  the  qeneral  Jackson's  model,  and 
other  queueinq  disciplines  as  described  in  section  3.1.2  are  allowed. 

In  particular,  one  can  model  statistical  multiplexinq  by  Type  2  service 
node  (processor  sharinq)  and  some  preemption  strateqies  by  Type  4  service 
node . 


Assumption  (5)  was  first  introduced  by  Kleinrock  [KLEI64]  .  In 
Jackson's  model  the  service  time  at  each  node  is  an  independent  random 
variable,  while  in  Kleinrock 's  model,  as  in  our  communication  network 
model,  the  service  time  for  a  qiven  message  on  channel  i  depends  on  the 
message  Lenqth  b  and  the  capacity  CP  of  the  channel,  i.e.  b/C\  .  The 
service  times  at  different  channels  are  therefore  not  independent. 
Besides,  there  is  a  dependency  between  the  interarrival  times  and 
lenuths  of  adjacent  messages  as  they  travel  within  the  network.  (See,  for 
example,  Equation  (3.4)  in  Kleinrock  [  KI.EI64]  X  To  alleviate  this 
problem  Kleinrock  introduced  the  Independence  Assumption: 

Each  time  a  message  is  received  at  a  node  within  a  network,  a 

new  lenqth  b  is  chosen  independently  from  the  pdf: 

-Ub, 


W  =  We 


0 


boi  0 


Kleinrock  has  treated  this  assumption  extensively  in  [ KI.EI64]  which 
includes  a  computer  simulation  model  and  argued  that  so  lonq  as  there 
are  multiple  channels  coming  into  and  leavinq  a  communication  node,  this 


assumption  is  reasonable. 
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Assumption  (6)  says  that  once  a  messaqe  has  finished  transmission 
at  channel  i,  it  will  be  routed  to  channel  j  with  a  certain  probability 
p^ ,  irrespective  of  its  distination.  This  assumption  is  invalid  for 
most  networks  we  are  interested  in,  and  it  is  another  reason  why  we 
believe  Jackson's  model  is  inadequate.  Consider,  for  example,  the 
ARPANET,  where  messaqes  are  routed  in  the  network  over  different 
channels  according  to  their  destinations. 

Assumption  (7)  is  unrealistic  for  networks  with  limited  buffer 
space  such  as  the  ARPANET.  However,  with  the  development  of  cheaper 
and  smaller  core  memories,  buffer  space  can  be  practically  unlimited 
in  the  near  future. 

Assumption  (8)  is  realistic  if  one  assumes  that  there  is  a  layer 
of  software,  superimposed  on  the  communication  subnetwork,  which  is 
responsible  for  detectinq  errors  in  transmission  (e.q.  using  the  cyclic 
redundancy  check)  and  for  retransmission  of  noisy  messaqes.  Of  course, 
in  this  case  the  service  time  has  to  be  adjusted  to  reflect  the  increased 
volume  of  traffic  due  to  retransmissions. 

It  is  mainly  because  of  Assumptions  (2),  (5)  and  (6)  that  we  have 

decided  that  Jackson's  model  is  inadequate  in  modelling  a  communication 
subnetwork.  In  the  next  section,  we  shall  propose  a  new  Independent 
Queues  Model  that  will  alleviate  these  difficulties. 

1 . 3  Independent  Queues  Model  of  Communication  Subnetwork 

3.3.1  Model  Description 

The  model  is  based  on  the  Independent  Queues  Assumption*: 

*Th  i  >:■ sumption  is  suggested  by  Prof.  Robert  Hal  lager. 
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Consider  a  communication  subnetwork  with  N  channels.  Suppose  the 

total  arrival  rate  of  messaqes  to  channel  i  is  A . .  A.  includes  both 

1  1 

the  external  arrival  rate  of  messages  at  channel  i  which  is  Poisson  and 
the  internal  transfer  from  other  channels  to  channel  i  according  to  some 
rout.inq  strateqies.  Let  p (k^ ,k? , . . ,k  )  be  the  equilibrium  joint  pmf  of 
the  number  of  messaqes  at  the  different  channels,  and  p^(k^)  be  the 
equilibrium  marqinal  pmf  of  the  number  of  messages  at  channel  i  under 
Poisson  input  rate  A^,  then,  provided  the  utilization  at  each  channel 
is  less  than  one,  it  is  assumed  that 

p(kl'k2’---kN  }  =  Pl(kl)p  (k2)"-PN(kN) 

In  particular,  if  the  message  lengths  are  exponential,  the  Independent 
Queues  Assumption  says  that,  in  equilibrium,  channel  i  behaves  as  if 
it  were  an  independent  M/M/1  queue  with  Poisson  input  rate  X^. 

The  Independence  Queues  Model  allows  us  to  calculate  the  message 
delay  for  the  network  very  easily.  The  average  message  delay  T^  on 
channel  i  is  l/(pC.-A.)*  where  C.  is  the  capacity  of  channel  i,  A.  is 

li  l  '  i 

the  total  arrival  rate  of  messaqes  to  channel  i,  and  1/p  is  the  averaqe 

length  of  messaqes. 

Consider  the  average  number  of  messages  (both  in  queue  and  in 
service)  in  channel  i  in  equilibrium,  n^.  Applying  Little's  Formula 
[LITT61]  ,  this  is  just  X^T^.  Applying  Little's  Formula  again  for  the 
whole  network  and  denote  the  average  number  of  messaqes  in  the  whole 
network  by  n,  we  have 

yT  =  n  =  y  n . =  y  A . T . 

'i  l  .  l  l 

i  i 

♦Average  service  Lime  for  M-'M/l  queue  with  arrival  rate  A^  and  service 
rate  u".. 
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where  Y  =  sum  of  external  arrival  rates  to  network,  T  =  average  message 
delay  for  the  whole  network,  then 


i  *• 

t  =  —  y — i — 

Y  4yC.  -  X. 

11  l 


(3.5) 


Note  that  the  derivation  of  Equ.{3.5)  does  not  require  the  service 
times  at  the  different  channels  to  be  independent.  The  only  requirement 
is  that  the  service  time  at  each  channel  i  can  be  approximated  as  a 
function  of  the  input  traffic  X^. 

Our  model  ignores  the  propagation  delay  for  the  energy  of  a  bit  to 
travel  from  one  end  of  a  channel  to  the  other.  Even  though  the  speed  of 
propagation  is  a  significant  fraction  of  the  speed  of  light,  the  propa¬ 
gation  time  may  be  significant  if  the  path  is  long.  In  addition,  there 
is  the  additional  delay  introduced  during  local  processing  at  each  of 
the  channels.  Let  P  ,  K_  be  the  propagation  and  local  processing  delays 
associated  with  the  ith  channel,  then  our  average  message  delay  given 
above  must  be  rewritten  as: 


(3.6) 


3.3.2  Assumptions 

The  Independent  Queues  Model  of  a  communication  subnetwork  requires 
the  following  assumptions: 

(1)  Poisson  message  arrivals. 

(2)  First-come-f irst-served  queueing  discipline. 

(3)  Independent  Queues  Assumption. 

(4)  Unlimited  queueing  capacity  at  each  node. 

(5)  Noiseless  channels  and  perfectly  reliable  nodes  and  channels. 
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Assumptions  (1),(2),(4)  and  (5)  are  also  necessary  for  Jackson's 
model.  These  assumptions  have  been  discussed  in  section  3.2.2  and  will 
not  be  repeated  here.  Assumption  (3),  like  Kleinrock's  Independence 
Assumption,  is  hard  to  "justify.  The  rationale  for  this  assumption  is 
that  it  makes  tha  problem  analytically  tractable  and  seems  to  qive 
reasonable  results.  While  this  assumption  seems  to  be  stronqer  than 
the  Independence  Assumption,  it  affords  us  more  flexibility.  For 
example,  we  can  now  model  qeneral  messaqe  lenqth  distributions.  Instead 
of  independent  M/M/1  queues,  we  then  obtain  independent  MAj/'l  queues. 

We  can  also  model  different  classes  of  messaqes  usinq  the  M/G/l  queues. 
In  addition,  we  can  model  more  accurately  the  routing  of  messaqes  in 
the  network. 

3 . 4  End-to-e  id  Transmission  Delay 

A  key  parameter  in  our  DDB  model  is  the  end-to-end  delay  which  is 
the  elapsed  time  from  the  trans  mission  of  a  messaqe  at  its  source  to 
the  delivery  of  the  messaqe  at  its  destination. 

Since  messaqes  typically  preserve  their  lenqth  as  they  traverse 
the  system,  the  interarrival  and  service  sequences  at  each  communication 
channel,  and  the  service  times  at  successive  communication  channels 
are  dependent.  The  distribution  of  the  end-to-end  delay  is  thus  mathe¬ 
matically  intractable.  A  number  of  results  concerninq  this  delay  are 
presented  by  Calo  in  [CALOBC-l  .  These  include  :  ordering  relations  for 
the  successive  waitinn  times  in  the  channel;  waiting  time  properties 
under  extreme  conditions;  and  simple  bounds  for  systems  with  uniformly 


bounded  service  processes.  However,  these  results  are  not  useful  to  us 
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in  the  DDR  model.  In  fact,  even  if  we  make  the  Independent  Queues 
Assumption,  the  distribution  of  end-to-end  delay  still  eludes  us. 

Recall  that  if  we  make  the  Independent  Queues  Assumption,  the  number  of 
messages,  and  hence  the  service  time,  at  all  channels  are  independent  in 
equilibrium.  This  means  that  at  a  particular  time,  say  t,  the  service 
times  at  all  channels  are  independent.  However,  when  a  message  is 
traversing  the  system  fran  one  channel  to  another,  it  will  be  receiving 
service  at  successive  channels  at  different  times.  The  service  time  of 
a  particular  message  at  successive  channels  are  thus  dependent.  It  is 
this  dependence  that  makes  the  analysis  difficult. 

In  order  to  analyze  the  end-to-end  delay,  approximations  are  made. 

3.4.1  Exponential  End-to-End  Delay 

Assume  that  the  end-to-end  delay  is  exponential.  This  will  be 
true  if  the  delay  at  one  channel  dominates  the  total  delay.  Since  an 
exponential  distribution  is  completely  characterized  by  its  expected 
value,  once  we  find  the  expected  value  of  the  end-to-end  delay,  we  have 
determined  the  distribution. 

Consider  a  communication  subnetwork  with  N  nodes  and  M  channels. 

Denote  the  channel  from  node  k  to  node  £  by  the  tuple(k,£),  where 
k,i>.  =  1 , .  .  . N .  Let  C  denote  the  set  of  all  channel  tuples,  i.e. 

C  ■•=  {  ( k ,  t)  |  k,  ?  =  1 , .  .  . N } . 

Let  D,  a  =  service  time  at  channel  (k,S.), 
k  - 

T. .  =  end-to-end  delay  from  node  i  to  node  j, 

n 

A. .  =  Poisson  arrival  rate  of  messages  at  node  i  destined  for  node  j, 
i  i 

‘  .  .  Ik)  -  fraction  of  i-j  traffic  passing  through  node  k, 
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q^_.(k,jU  =  fraction  of  i-j  traffic  passing  through  channel  (k,  ;.)  , 
(k,  5,)  =  routinq  variable,  fraction  of  i-j  traffic  at  node  k 
that  is  routed  on  channel  (k, i) , 


(  0  <_  <}> .  .  (k,  £)  <_  1,  )><j> .  .  (k,  i)  =  1  for  all  k  ) 
1D  l 

L^ =  average  number  of  i-j  messages  in  the  network. 


L„  (k,  5.)  =  average  number  of  i-j  messages  at  channel  (k,i). 


Little's  Formula [LITT61]  gives 


x . .t. .  =  l. .  =  y  l. . (k, i)  =  y 

1111  13  (k,i)£C13 


X  .  .g.  .  (k,  2,)D  . 
ill]  k£ 


(k,  £)  EC 

=  [  X_f  _  (k)  <(> .  ^  (k ,  2. )  D, 
(k ,  jt)  eC 


i]  13  13 


kl 


Hence,  T.  .  -  T  f  .  .  (k )  <J> .  .  (k,ji)D 

1]  (k,a ) eC  11  k* 

N 

where  f..(k)  =  V  f  .  .  (l )  4> .  .  (2.  ,k) 

IT  *-*  T  -1  1-1 


13 


2  =  1 


13  13 


(3.7) 

(3.8) 


If  the  routing  is  loop-free,  for  each  origin-destination  pair  (i,j), 
Equ.(3.8)  can  be  solved  recursively  for  f  (k)  ,  k  =  1,...N.  Therefore, 
the  average  end-to-end  delay  for  all  nodal  pairs  can  be  found  using 
Ecu.  (3.7)  and  (3.8) . 


3.4.2  Normal  End-to-end  Delay 

Another  approximation  is  to  assume  that  the  end-to-end  delay  is 
normal.  Since  each  T. .  is  the  sum  of  the  service  times  over  several 

13 

channels,  by  invoking  the  Central  Limit  Theorem,  the  approximation  will 
be  good  provided  the  path  corresponding  to  T^_.  includes  many  hops. 

Note  that  in  section  3.4.1,  the  derivation  does  not  require  that 
the  service  time  at  the  different  channels  be  independent.  In  the  followinq 


derivation,  however,  we  assume  that  the  service  times  at  different  channels 
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arc  independent.  The  notations  used  are  the  same  as  that  in  section  1.4.1. 
in  addition,  let  T,.(t)  be  the  pdf  of  T..,  T6.(s)  be  the  l.apluc-e  transform 

n  13  13 

of  T  (t),  D  ( t )  be  the  pdf  of  D  ,  and  I  ®  (s)  be  its  Lap!  ace  Transform. 


(  D.  +  T 
)  lr  r 


with  probability  $  (i,r) 


Now,  T  r3  l) 

(  D_  with  probability  <t>_(i,3) 

N 

Therefore,  Te.(s)  =  \  .  .  (  i  ,  r )  D6  (s)T®  (s)  +  4> .  (i  ,  j  )  D®  .  (s) 

X1  y=1  13  it  r3  13  13 

r^j 

N 

=  1  4>..(i,r)D®  ( s)  T®  .  ( s )  +  <)> .  .  (  i ,  3  )D®  .  (s) 


with  probability  <t>_(i,3) 


tr  r3 


r=  1  n 


lr  r  3 


. . ( i , i ) D®  . (s) [1  -  T®.  (s) ] 
11  13  33 


T®.  (s)  =  7  <p.  .  (i,r)De  (s)T6.  (s) 


lr  r3 


since  it  is  assumed  that  T  =0. 

31 

In  particular,  if  the  total  message  arrival  rate  at  channel (k,£)  is 

^kf  '  t*le  caPacitV  •‘■s  Ckf'  and  the  messaqe  lenqth  is  exponential  with  mean 
1/v,  then  the  total  service  time  at  channel (k,C)  is  exponential  with  mean 


1/v  where  v  =  uC  -  A  Therefore,  D.  (s)  =  v.  /(s+v.  )  and  Equ.(3.9) 


ir  lr 


becomes : 


N  v 

T?.  (s)  =  l  <J>.  .  (i,r) — ~ —  T®.(s) 

13  ,1  s  +  v.  r3 

r=l  ir 


(3.10) 


E(T  )  =  -  --  T®.  (s) 

13  -Is  13  s=0 


E(T.  .  )  =  l  <(> .  .  (i,r)  [E(T  .)  +  1/v.  ] 
i]  £=1  13  rl  lr 


(3.11) 


9  u  e  1 

E  (T.  .  )  =  — ?  T.  .  (s) 

i  )  ds  13  | s=0 

N 


E  ( T  ?  .  )  =  l  <t.  .  (i  ,r)  [E(T?  .)  +  2E(T  ) /v  .  +  2/v?  ] 


r  3  ir 


(3.12) 


If  the  routing  variables  $  (k,l)  correspond  to  loop-free  routinq 
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is  iin  N  bv  N  square  matrix  with  elements  .  (  i  ,r)  v  Is  +  .  ). 

‘  _  it  i  r  1 1 

Hence  ( s )  [l  -  re  <  s  1  ] 

l.S  Attempt  to  Model  Message  Broadcasts 

It.  a  communication  network,  it  is  sometime;  necessary  to  broadcast 
messages  rrom  one  node  to  other  nodes  in  the  network.  After  a  node 
receives  a  broadcast  messaqe,  copies  of  this  messaqe  will  be  sent  to  its 
neighbours.  In  section  3.2,  we  have  already  pointed  out  some  of  the 
inadequacies  of  Jackson's  Queues  in  modelling  a  communication  network. 

In  this  section  we  shall  discuss  another  inadequacy.  We  shall  show  that 
messaqe  broadcasts  cannot  be  modelled  by  Jackson's  Queues. 

In  Jackson's  model,  under  equilibrium  conditions,  each  individual 
queue  in  a  network  of  queues  is  independent  of  other  queues.  We  can 
consider  a  system  of  queues  ,  where  after  beinq  served  at  one  queue,  a 
customer  (or  message  in  commuication  network  context)  splits  into  two 
and  obtains  service  at  two  different  queues. 

In  Fiq.3.3,  nodes  1,  2  and  3  are  three  queues  with  exponential 
service  times.  Messages  arrive  at  nodes  1  and  2  independently  in  a 


Poisson  maimer. 


After  beinq  served  at  node  1,  each  messaqe  splits  into 
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Hy  moving  the  first  term  on  the  right  to  the  left  hand  side  and 

-  ial 

— ►  d 

dividing  by  h,  we  qet  a  di f ferent* equat ion  in  N(t).  Settinq  —  N(t)  =  0 

dt 

for  equilibrium  conditions,  we  qet: 

P [n  =  (m,n)](X  +  >2  +  2u  )  =  P[N  =  (m- 1 , n- 1 ) ] X ^  +  P[N  =(m-l,n)]X2 

+  p[n  =  (m+l,n) ]X  g/(X  +X2) 

+  P[N  =  (nH-l,n-l)]X  v/(X  +  X  ) 

+  P [n  =  (m,n+l)]u 

sue  that  ££p[n  =  (m,n)]  =  1 
mn 

In  addition,  we  must  satisfy  the  boundary  condition: 

P[N  =  (0,0) ]  (Xj+X  )  =  p[n  =(1,0)]X  u/(X1  +  X2)  +  p[n  =  (0,l)]u 

.-*■  m  n 

Suppose  the  compound  pmf  has  a  product  form,  i.e.  PIN  =  (m,n)J=  Y  X 
then  the  equilibrium  balance  equation  gives: 

YX  ( X  +  2u  )  =  Xj.  +  XX2  +  Y2xXlp/X  +  Y2X  p/X  +  YX2p  (3.14) 

while  the  boundary  condition  qives: 

X  =  YX^u/X  +  Xu  (3.15) 

Solving  Equ . (3.14)  and  (3.15)  simultaneously,  we  qet  a  quadratic  in  X 

x2[xxlU2  +  \V]  +  x[x2x2u  -  x2xxu  -  x2x1u  -  2 x 2 x 2 y ]  +  x2u  +  x3x2  =  0 

Solvinq  this  quadratic  for  X,  we  found  that  dependinq  on  the  values 
of  the  parameters  X^,  X2,u,we  might  qet  imaginary  roots  for  X,  and  hence 
imaginary  roots  for  Y.  Hence  the  assumption  that  P[N(m,n)]=  Y  xn  is 
incorrect,  and  the  compound  pmf  does  not  exhibit  product  form. 

Thus  Jackson's  Ououes  cannot  model  message  broadcasts.  However, 

'if ir*  i  our  Independent  Queues  Model,  message  broadcasts  can  be  modelled 
i,mm  !y.  Since  we  assume  that  the  queueing  processes  at  different  channels 
ire  independent  in  equilibrium,  and  the  service  time  at  each  channel  is 


only  3  r unction  of  the  total  input  traffic  at  that  channel.  This  mr.ur. 
trat :ic  mv  consist  of  external  messaqe  arrivals,  internal  transfers 
accoi'.linu  to  some  routine  strategy  and  broadcasted  messages . 
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CHAPTER  4 

DISTRIBUTED  DATABASE  MODEL 


In  this  chapter,  we  are  goinq  to  describe  the  DDB  model.  In  section  3.3, 
we  modelled  the  competition  among  messages  (generated  by  transactions) 
for  the  services  of  the  communication  channels.  In  this  chapter,  we  are 
also  qoinq  to  model  the  competition  for  the  services  of  the  local  DBMS 
at  each  computer  site.  Thus,  in  order  for  a  transaction  to  complete 
successfully,  it  not  only  has  to  wait  for  the  services  of  the  different 
communication  channels,  but  also  the  service  of  the  database  management 
system.  This  service  includes  setting  locks,  reading  and  writing  data 
items,  etc. 

Fig.  4.1  shows  the  basic  architecture  of  a  DDB.  Database  sites 
are  connected  to  each  other  via  a  communication  subnetwork.  At  each 
database  site  is  a  computer  runninq  one  or  both  of  the  software  modules: 
Transaction  Module  (TM)  and  Data  Module  (DM).  The  TM  supervises  user 
interactions  with  the  database  while  the  DM  manaqes  the  data  at  each  site. 

We  propose  a  5-step  approach  to  model  the  performance  of  concurrency 
control  algorithms.  This  is  summarized  in  Fig. 4. 2.  We  shall  now  describe 
these  five  steps  in  more  detail. 

4 . 1  Input  Data 

Given  a  DDB  managed  on  an  arbitrary  communication  network,  we  have 
to  determine  the  following: 

( L )  topology  of  the  network,  i.e.  the  connectivity  and  capacity  of  links 
between  computer  sites. 

(2)  inns  of  all  copies  of  files. 
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Figure  4.1  Basic:  Architecture  of  a  DDB 
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•  response  time 
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■  transmission  delay  1  delay  due  to  conflict 


Figure  4.2  Performance  Model  of  a  DDB 
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(3)  transaction  classes  defined  by  their  readsets  and  writesets. 

(4)  update  request  rates  of  the  different  transaction  classes. 

4.2  Transaction  Processing  Model 

Consider  a  trans.  t  arrivinq  at  database  site  i  and  processed  by 

TM  .  Suppose  the  read  set  of  T  consists  of  data  items  X,Y  and  its 

a 

writeset  consists  of  data  items  U,  V  where  U  =  f(X,Y),  V  =  q(X,Y). 

This  update  will  be  performed  in  two  steps,  a  query  processing  step  and 
a  write  step. 

(1)  Query  Process inq  -  TM^  will  choose  one  copy  of  data  item  X  and  Y 
and  devise  a  query  processing  strategy  that  will  produce  the  result 
at  database  site  i. 

(2)  Write  -  The  new  values  of  U  and  V  will  be  written  into  the  daf  >ase. 
This  is  accomplished  by  the  two-phase  commit  algorithm. 

(i)  Pre-commits  -  TM^  sends  new  values  of  U  and  V  to  all  DM's  having 
copies  of  U  and  V,  respectively.  The  DM's  then  copy  t'  t  new 
values  to  secure  storage  and  acknowledge  receipt. 

(ii) Commits  -  After  all  DM's  have  acknowledged,  TM  sends  commit 

ot 

messages,  requesting  the  DM's  to  copy  the  new  values  of  U  and 

V  from  secure  storage  into  the  database. 

Since  the  network  is  not  perfectly  reliable,  if  TM  asks  the  DM's 

a 

to  copy  the  new  values  of  U  and  V  into  the  database  in  one  step,  it  is 
possible  that  some  copies  of  0  and  V  have  received  the  updates  while 
other  copies  have  not.  This  will  result  in  database  inconsistency. 
Two-phase  commit  is  an  attempt  to  prevent  such  inconsistencies .  It  is 
bv  no  means  the  standard  solution.  However,  it  seems  to  be  very  popular 
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amonq  researchers  in  DDB.  (See  for  example,  [BG80  ]  ,  [GRAY781.) 

Step(T)  of  the  Trans.  Processing  Model,  namely  Query  Processing,  is 
a  very  hard  problem.  Since  the  database  is  redundant,  there  are  different 
DM’s  a  transaction  can  access  when  it  wants  to  read  a  certain  data  item. 

The  TM  at  the  database  site  where  the  transaction  originates  is  responsible 
for  choosinq  the  best  (in  terms  of  an  objective  function  such  as  minimization 
of  response  time)  DM  to  access.  In  the  case  of  transactions  that  access 
multiple  files,  the  TM  must  devise  a  strategy  to  solve  the  query. 

As  is  mentioned  in  section  1.4.2  previous  researchers  get  around  the 
query  processing  problem  by  assuming  a  fully  redundant  database,  in  which 
case  all  queries  will  be  addressed  to  the  local  site  and  incur  zero 
communication  delay.  We  do  not  believe  this  is  a  realistic  assumption, 
and  are  confronted  with  the  problem  of  modelling  query  processing.  In 
particular,  since  our  major  thrust  is  the  performance  comparisons  of  con¬ 
currency  control  algorithms,  we  need  a  query  processing  model  that  is 
simple  while  at  the  same  time  not  too  unrealistic.  This  is  the  object  of 
the  next  chapter. 

Using  our  trans. processing  model,  we  can  determine,  for  each 

particular  transaction,  the  file  transfers,  read  requests  and  update 

messages  that  are  necessary.  This  information,  together  with  the  transaction 

arrival  rates  and  the  file  locations,  etc.,  let  us  qenerate  estimates  for 

f. . ,  the  arrival  rate  of  messages  at  site  i  destined  for  site  j. 
i] 


4.3  Communication  Subnetwork  Model 


Usinq  the  messaqe  flow  requirements  between  database  sites,  f  ,  and 

ij 

the  newtork  topoloqy  as  input  to  a  routing  strategy,  such  as  Gallaoer’s 
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Minimum  Delay  Routinq  Strateqy  [GALL77] ,  we  can  determine  the  total  traffic 
on  each  channel  of  the  network.  We  can  then  find  the  average  service 
time,  utilization  and  throughput  at  each  channel. 

4 . 4  Conflict  Model 

This  is  an  important  component  of  the  DDB  Model  and  will  be  discussed 
in  more  detail  in  Chapter  6.  Briefly,  it  allows  us  to  find  the  probability 
of  conflict  between  different  transactions  and  the  cost  of  these  conflicts. 

4. 5  Performance  Measures 

We  emphasize  the  performance  measure  most  visible  to  the  users, 
namely  response  time,  which  is  the  sum  of  local  processing  delay  at  the 
database  sites,  transmission  delay  and  delay  due  to  conflicts. 


1 
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CHAPTER  5 
QUERY  PROCESSING 

Accessing  data  distributed  at  different  computer  sites  necessites 
transmission  over  communication  links.  An  advantage  of  the  distributed 
system  is  the  ability  to  process  and  transmit  data  in  parallel  at 
separate  points  in  the  network.  Since  the  delay  due  to  communications 
is  substantial,  the  DBMS  must  devise  an  efficient  arrangement  of  local 
data  processing  and  data  transmissions  in  order  to  process  distributed 
queries . 

5 . 1  Review  of  Existing  Query  Processing  Algorithms 

There  are  very  few  reports  of  work  on  distributed  query  processing. 

Wong  proposed  an  algorithm  that  is  being  implemented  in  SDD-1  IWONG77] . 
Ilevnur  and  Yao  [HY79] ,  proposed  a  simple  algorithm  for  a  special  class 
of  queries.  These  two  approaches  will  be  discussed  in  more  detail  in 
the  next  sections.  More  recently,  Chiu  [CHIU79]  has  devised  a  dynamic 
programming  solution  for  certain  queries  called  tree  queries.  Since  the 
apnl icability  of  his  method  is  extremely  restricted,  we  shall  not  discuss 
it . 

s. 1 . 1  Relational  Data  Model 

In  this  section,  we  shall  describe  the  relational  data  model  ( [CODD70] , 
[DATE 77] ) ,  which  is  the  model  assumed  in  both  Wonq 1 s  Algorithm  [WONG77] 
and  Hevner  and  Yao's  Algorithm  [HY79] . 

A  data  model  is  a  cLass  of  data  structures.  All  information  maintained 
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in  the  database  is  organized  as  data  structures  that  are  instances  of 
this  class. 

Given  n  sets  D,  ,  D-....D  ,  a  relation  R  is  a  subset  of  their  Cartesian 

12  n 

product,  i.e.  RC  d,  *  D_  *  ...  x  D  ,  and  D,  ,...D  are  called  the  domains 
“12  n  In 

of  the  relation.  A  database  relation  is  a  time-varyinq  subset  of  the 

Cartesian  product,  i.e.  R(t)  CD,  *  D.,  *  . . .  x  D  .  The  relational  model 

—  1  2  n 

of  a  DDB  assumes  that  the  unit  of  data  distribution  is  a  relation.  In 

each  database  site,  there  are  one  or  more  relations.  A  database  querv 

performs  the  operations  restriction,  projection,  join  and  semi-join  in 

order  to  retrieve  data.  (See  [OODD70] ,  [WONG77],  [CHIU79] ) .  A  restriction 

of  R  selects  rows  of  R  that  satisfy  specified  data  conditions,  e.q.  >  100. 

A  projection  of  R  is  formed  by  delotinq  some  of  the  domains.  For  example, 

R  [Dj,D_,]is  a  projection  of  R  consist  inq  of  the  first  two  domains.  Consider 

•* 

two  relations  R(A,B)  and  S(C,D).  The  R-ioin  of  R  on  A  and  S  on  C  is 
K[AtC]s  r  t  r |  | s :  rt R  and  stS  and  r [A] OS [b] ) ) ,  where  r |  I s  7  (a,b,c,d)  if 
r  -  .a,b)  and  s  =  u:,d);  and  0  ►  {  ,  i,  •,  J,  =,  /}. 

For  example: 


r [ sui pin  =  shipid] s 
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*  P i A , B i  iepotes  a  l  e  \  it  ion  with  only  two  domains  A  and  B. 
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The  --lo in  can  also  be  written  R^n^S.  Let  R(A,b)  and  S(b,C)  be 

two  relations,  then  the  semi- join  R^  S  =  RgMgS[A,B]  ,  i.e.  R*S  is  the 

join  of  R  and  S  projected  back  onto  R. 

5.1.2  Wong's  Algorithm 

Wong  makes  the  following  assumptions: 

(1)  Consider  subsets  of  the  DDB  called  "materializations"  such  that 
each  consists  of  exactly  one  copy  of  each  portion  of  the  database. 

Each  materialization  is  a  distributed  but  non-redundant  version  of 

the  database.  Data  retrieval  will  be  performed  on  one  materialization. 

(2)  Only  fragments  of  a  relation,  i.e.  restrictions,  projections,  or  a 
combination  of  them  are  moved  from  one  site  to  another. 

(3)  We  are  indifferent  as  to  where  the  final  result  is  produced,  so  long 
as  it  is  produced  at  a  sinqle  site. 

(4)  The  communication  cost  as  a  function  of  data  volume  and  links  is 
known . 

t.5)  The  costs  of  local  operations,  i.e.  projection,  restriction,  and  join 
are  known. 

(6)  The  sizes  of  resulting  fragments  after  local  operations  are  known. 

The  algorithm  is  as  follows: 

(1)  Perform  all  local  operations  that  are  possible.  Choose  one  site  to 
move  all  fragments  to  .  Denote  the  set  of  moves  by  M  ={m^ ,m^ , . . .m^} , 
where  each  m^,  k  =  l,...m,  is  of  the  form  "move  fragment  x  from  node  i 
to  node  j". 

f2)  Replace  M  by  two  sets  of  moves  and  ,  that  are  to  be  executed 

sequentially  with  local  processing  between  them.  This  is  represented 
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graphically  as  follows: 


where  it  is  understood  that  the  left  leaf(M^)  precedes  the  right (M?) 
with  local  processing  in  between.  The  criteria  for  splitting  a  node  N 
into  (N^,N0)  are: 

(i)  the  combined  cost  of  (N^,^)  is  less  than  N  alone,  and 

(ii)  the  pair  (N^,N0)  is  minimum  in  cost  among  all  pairs  satisfying  (i). 
It  is  possible  that  we  can  continue  to  split  the  nodes  using  the  above 
criteria  and  in  general  end  up  with  a  binary  tree  where  the  leaf  nodes 
represent  all  the  moves  that  are  to  be  undertaken.  The  moves  within 

one  leaf  can  be  made  in  parallel ,and  the  leaves  are  executed  sequentially 
from  left  to  riqht. 

(3)  The  algorithm  stops  when  no  further  node  splitting  is  possible. 

Note  that  Wonq's  Algorithm  is  a  greedy  algorithm.  It  looks  for  immediate 
gains  and  will  give  us  a  local  optimum  but  not  necessarily  a  qlobal  one. 

5.1.3  Hevner  and  Yao's  Algorithm 

The  following  assumptions  are  made: 

(1)  &  (2)  same  assumptions  as  Wong's  algorithm. 

(3)  The  result  is  to  be  produced  at  the  node  where  the  query  originates. 

(4)  The  communication  cost  between  any  two  nodes  is  defined  as  a  linear 

function  C(X)  =  +  c^X  where  X  is  the  amount  of  data  transmitted,  and 

cQ  and  c.j  are  constants.  In  other  words,  the  topology  of  the  network 
and  the  queueing  effects  are  ignored.  The  cost  is  measured  in  units 

of  time. 

(5)  The  costs  of  local  operations,  i.e.  proiections,  restrictions,  etc. 
are  nenliqible  compared  to  the  communication  costs. 

(b)  It  is  assumed  that  after  initial  local  processing,  each  relation  (or  file) 
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accessed  in  the  query  contains  only  one  domain  -  the  common  ioininq 
domain.  When  file  i,  of  size  S.  is  processed  with  file  i,  the  resultin'! 
file  has  size  S^p  *  where  p_.  ,  the  selectivity  parameter  of  file  i  is 
between  0  and  1.  The  selectivity  parameter  is  cumulative,  and  if  file  i 
is  processed  with  both  file  j  and  file  k,  the  resulting  file  will  have 


size  S  .  p  .  p,  . 
i  i  k 

The  data  transmission  containing  the  transmission  of  relation  i,  R., 


to  the  result  node  is  called  the  schedule  of  R^.  Define  schedule  response 
time  r.  as  the  time  from  the  start  of  the  transmission  until  R.  (or  a 

l  l 

processed  version  of  it  )  is  received  at  the  result  node.  Define  minimal 

schedule  response  time  ?  =  min  r.  where  the  minimization  ranges  over  all 

possible  schedules.  The  response  time  of  a  query  retrieval  strateqy  r  is 

defined  as  r  =  max  r.  where  m  is  the  number  of  relations  in  the  query,  and 
l^i^m  1 

total  time  t  is  defined  as  the  sum  of  all  transmission  costs  for  the  strateqy. 

Hevnor  and  Yao  proposed  two  algorithms:  a  parallel  strateqy  to 
minimize  r  and  a  serial  strateqy  to  minimize  t. 

The  parallel  strategy  uses  the  initial  feasible  solution  of  sending 
each  relation  directly  to  the  result  node  as  a  starting  strategy.  It  then 
searches  for  cost  beneficial  data  transmissions  by  trying  to  join  small 
relations  to  large  relations.  The  strateqy  is  described  as  follows: 

(1)  Relations  R.  are  ordered  so  that  S.  <  S„  <  ...  <S  . 

(2)  Repeat  steps  (3)  to  (4)  for  i  =  1  to  m. 

(3)  Find  the  minimal  schedule  response  time  ? . .  All  relations  R.  where 


i  1 

j  <  i  are  checked  for  potential  data  transmission  to  R^  and  the 
transmission  that  produces  the  greatest  reduction  in  r^  is  integrated 
into  the  strategy.  For  the  data  transmission  from  R.  to  R.,  the 


*  While  llovner  and  Yao[nY79]  did  not  state  it,  the  implicit  assumption 

i  s  -  hat  P  .  »> .  -  ?.!>.. 

i  !  11 
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il)  All  lies  accessed  by  the  query  have  the  same  size. 

>5)  The  selectivity  parameter  of  all  files  is  one.  Therefore,  when  two 
or  more  files  are  processed  together,  the  resultinq  file  has  the 
same  size  as  one  of  the  oriqinal  files. 

Assumptions  (3), (4)  and  (5)  imply  that  the  cost  of  a  query  processing 
strateqy  is  a  function  of  the  communication  links  employed  in  the  strateqy, 
irrespective  of  the  volume  of  traffic  on  these  links. 

In  addition,  we  are  restrictinq  all  file  processing  to  be  performed 
at  the  result  node  or  at  the  nodes  where  the  file  copies  accessed  by  the 
query  are  located.  This  restriction  is  necessary  in  order  for  the  MST 
Algorithm  to  be  optimal.  If  file  processinq  can  be  performed  at  any  node 
in  the  communication  subnetwork,  then  in  order  to  find  the  optimal  strateqy, 
we  have  to  solve  a  Steiner  Problem,  which  is  a  much  harder  optimization 
problem  than  the  MST.  This  will  be  discussed  in  more  detail  in  the  next 
section . 


The  MST  Alaorithm  finds  the  optimal  query  processinq  strateqy  that 
minimizes  the  total  communication  costs.  Recall  the  communication  sub¬ 
network  model  described  in  section  3.3.  The  averaqe  delay  T  for  all 

messages  in  the  network  is  qiven  by  Equ.(3.5): 

X 

1  v  i 

T  ~  .  L  '  where  the  summation  is  taken  over  the  set  of  all  channels. 

\  VlCi_Vi 


XT 

'  it  'nco, 


C  . 

l 


1  (>  ;  u)  y  (C  .  -  A  .  /u)  ' 

1  11 


=  the  incremental  delay  for  all  messages  in 


the  network  per  unit  increase  in  flow  at  channel  i,  provided  the  increase 


in  t  low  is  small  compare.!  to  the  existing  traffic  at  the  channel.  If  we 
•T 


let 


bo  the  communication  costs  on  channel  i,  the  MPT  Algorithm 

'*T 


.ill  cutout  a  st  ratoqv  '  hat  minimizes  >  - :  .  The  cost  of  a  strategy  is 

v  3  (  i  ,  / II  ) 
i  i 

r  )  T 

proportional  to  > - —  because  of  unit  file  size.  Therefore,  the  MST  strategy 

i  '‘V"’ 


Am 
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will  minimize  the  incremental  delay  for  all  messaqes  in  the  network  due 
to  a  particular  query.  Obviously,  other  communication  costs  can  be  used. 

There  are  two  cases  to  consider:  non-redundant  files  and  redundant 
files. 

5.2.1  Non-redundant  Files 

In  this  case,  each  file  accessed  by  the  query  has  only  one  copy 
maintained  in  the  database. 

Define  a  directed  tree  as  a  directed  graph  without  a  circuit,  for 
which  the  outdegree  of  every  node  is  unity:  the  outdegree  of  the  root 
node  being  zero.  Note  that  our  definition  of  a  directed  tree  is 
different  from  the  usual  definition  (see,e.g.  [CHRI75]  )  in  that,  our 
directed  links  point  towards  the  root  node,  rather  than  outwards  from 
the  root  node. 

We  next  describe  the  MST  Algorithms  for  non-redundant  files.  There 
are  two  algorithms:  (1)  the  MSTl  Algorithm  restricts  all  file  processing 
at  the  node  set  N,  where  N  is  the  set  of  nodes  consisting  of  the  result 
node  R  and  the  nodes  where  the  file  copies  accessed  by  the  querv  are 
located,  (2)  the  MST2  Algorithm  allows  file  processing  at  all  nodes. 

The  MSTl  Algorithm 

(1)  Usinq  the  communication  costs  on  the  links  of  the  communication  sub¬ 
network  as  input  to  a  Shortest  Path  Algorithm,  find  the  shortest 
paths  between  every  pair  of  nodes  in  N.  In  general,  the  shortest 
path  from  node  i  to  node  i  will  have  different  lenqth  from  the  node 

1  to  node  i  path. 

(2)  Construct  a  fully  connected  directed  graph  C  with  node  set  N,  and 
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links  weights  equal  to  the  shortest  path  lengths  between  nodal 
pairs  as  calculated  in  Step  (1) . 

(3)  Find  the  minimum  weight  directed  spanning  tree  of  G  using  mode  R  as 
the  root  node  of  the  tree . 

(4)  Each  file  is  moved  to  the  result  node  R  using  the  directed  path 
dictated  by  the  MST.  When  two  paths  intersect,  the  two  corresponding 
files  are  processed  together,  resulting  in  one  file. 

The  algorithm  is  best  illustrated  by  an  example.  Consider  the 
six-node  communication  network  shown  in  Fig. 5. 1(a).  A  query  originating 
at  node  1  accesses  files  X,Y  and  Z  at  nodes  2,3  and  4  respectively.  The 
MST 1  Algorithm  says  that  we  shall  first  find  the  shortest  paths  between 
every  pair  of  nodes  in  the  set  of  nodes  {1,2,3,41.  We  then  construct  a 
fully  connected  graph  with  node  set  {1,2, 3,4}  and  link  weights  equal 
to  the  shortest  path  lengths  (See  Fig. 5. 1(b)).  The  MST  consists  of 
the  directed  links  (3,4),  (4,2)  and  (2,1)  (See  Fig.5.1(c)).  This  means 
that  file  Y  should  be  sent  to  node  4,  to  be  processed  with  file  Z.  The 
resulting  file  is  then  sent  to  node  2,  to  be  processed  with  file  X,  and 
the  final  result  is  then  sent  to  node  1. 

The  correctness  of  the  MSTl  Algorithm  is  based  on  the  following 
theorem. 

Theorem  5 . 1  Under  the  assumptions  of  the  MSTl  Algorithm,  each  query 
pi jcessinq  strategy  is  dominated  by  (i.e.  more  costly  than)  a  spannino 
tree  strategy  with  node  R,  the  result  node,  as  the  root  node, 
rroof:  The  specification  that  R  is  the  root  node  is  necessary  since  we 
want  the  result  produced  at  R.  Consider  a  fully  connected  directed 
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graph  with  node  set  N,  i.e.  the  set  of  nodes  consisting  of  R  and  the 
nodes  where  the  files  are  located.  Every  query  strateqy  corresponds  to 
a  subgraph  of  c..  If  the  strateqy  does  not  correspond  to  a  tree,  some 
node  itN  will  have  an  outdeqree  qreater  than  one,  say  two.  This  means 
that  rite  file  located  at  node  i  will  be  sent  out  twice  throuqh  two 
different  links.  This  is  obviously  inferior  to  -just  sendinq  the  file 
t.hrouah  one  of  the  two  links. 

P.E.D. 

The  MST1  Algorithm  is  optimal  since  the  MST  is  the  least  costly 
umono  all  spanninq  trees. 

The  MST 2  Algorithm 

(1)  Using  the  communication  costs  on  the  communication  links  as  the 

weights  of  the  links,  find  a  minimum  weight  tree  that  spans  the  node 
set  N,  with  node  R  as  the  root  of  the  tree. 

12)  Each  file  is  moved  to  the  result  node  R  using  the  directed  path  dictated 
by  this  minimum  weight  tree.  When  two  paths  intersect,  the  two 
correspond ing  files  are  processed  toqethor ,  resulting  in  one  file. 

Stei  (1)  of  the  MST 2  Algorithm  corresponds  to  finding  a  solution 
to  the  Steiner  Problem,  i.e.  finding  a  minimum  weight  tree  that  spans 
a  subset  of  the  nodes  of  a  graph.  The  Steiner  Problem  is  a  much  harder 
problem  than  the  MST  Problem.  Therefore,  in  this  thesis,  we  shall 
employ  the  MSTl  Algorithm,  i.e.  restricting  all  file  processing  to  the 
node  set  N. 

2.2..’  Redundant  Files 


In  this  case,  each  tile  accessed  by  the  query  may  have  one  or  more 
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eomes  maintained  in  the  database. 

Ry  inventing  artifical  file  nodes  and  artifical  links  of  weiqht  W 
(Mn.’.rcunq  each  file  node  with  its  copy  locations,  the  MST  Algorithm 
that  we  have  developed  for  the  non-redundant  case  can  be  extended  to  be 
used  for  the  redundant  case. 

Consider  Fig. 5. 2,  where  we  have  a  request  accessing  two  files  X 
and  Y.  There  are  copies  of  X  at  nodes  2  and  3,  and  copies  of  Y  at  nodes  4 
and  b.  Note  that  we  have  created  directed  artifical  arcs  (X,2),(X,3), 

(Y,4)  and  (Y,5)  with  weights  W.  The  direction  of  the  artifical  arcs  for 
file  XlorY)  ensuies  that  only  one  of  them  will  be  included  in  the  optimal 
strategy.  This  is  necessary  since  only  one  of  the  copies  of  X(or  Y) 
will  be  accessed.  The  weiqht  w  can  be  chosen  to  be  tero,  or  may  be  assiqned 
to  be  different  for  different  artifical  links,  to  reflect  the  different 
costs  of  accessing  a  file  at  different  copy  locations. 

A  strategy  to  satisfy  a  query  originating  at  node  1  and  accessing 
files  X  and  Y  will  be  represented  by  a  tree  spanning  node  1  and  the 
artifical  nodes  X  and  Y,  but  not  necessarily  spanning  the  whole  node  set. 

The  optimal  strategy  corresponds  to  the  minimum  weiqht  tree  of  this  type, 
i.e.  a  Steiner  Tree. 

In  general,  if  node  i  is  the  requesting  node  and  files  X,Y,  ...Z 
are  accessed  in  the  query,  then  the  optimal  strateqy  corresponds  to 
finding  the  minimum  weiqht  tree  spanning  the  node  set  U,  where 
U  =  i  i,X,Y,  .  . 

The  Steiner  Problem  can  be  solved  by  solving  a  number  of  MST's.  Fo; 
example,  if.  we  want  to  solve  the  Steiner  Problem  for  the  node  set  11  in 
a  graph  with  node  set  V ,  then  we  have  to  solve  MST  for  all  subgraphs  of 
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V  that  contain  u[HAKI7ll  .  Unfortunately,  the  number  of  subqraphs  is  lame. 
However,  because  of  the  special  structure  of  the  query  processing  problem, 
only  some  of  these  subqraphs  will  correspond  to  query  processina  strateqies. 
Consider  the  example  shown  in  Fiq.5.2;  all  we  need  to  consider  in  the  four 
subqraphs  shown  in  F iq .5.1. 

In  qeneral,  if  we  have  m  files  and  n  copies  of  each  file,  the  number 
of  subqraphs  to  consider  is  nm. 

This  artifical  file  node  and  artificial  link  technique  can  be  used 
to  generalize  Wonq ' s  Alqorithm  [WONG77]  to  redundant  databases.  Hevner 
and  Yao's  Alqorithm  [f!Y79]  ,on  the  other  hand»cannot  be  generalized  easily, 
since  they  assumed  the  same  communication  costs  between  each  pair  of  nodes 
and  do  not  allow  us  to  differentiate  the  costs  of  accessing  different 
copies . 

5.  i  The  Minimum  Distance  Tree  Algorithm 

The  assumptions  of  the  Minimum  Distance  Tree  (MDT)  Alqorithm  are  the 
same  as  that  of  the  MST  Alqorithm  and  are  listed  in  section  5.2.  While 
the  MST  Algorithm  minimizes  the  total  communication  costs,  the  MDT 
Alqorithm  minimizes  the  maximum  of  the  communication  costs  for  sendinq 
each  file  to  the  requestinq  node.  In  particular,  if  we  designate  the 
transmission  delays  on  the  communication  channels  as  the  communication 
costs,  the  MDT  Alqorithm  will  find  the  query  processing  strategy  corresponding 
to  minimum  response  time. 

Th e  MDT  A Igor it hm 

(1)  Construct  a  directed  graph  H  of  the  communication  subnetwork.  The 

nodes  of  H  are  the  nodes  of  the  communication  subnetwork  and  the  links 


to  be  Considered  in  the 
Steiner  Algorithm 
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of  H  are  the  communication  channels  with  weights  equal  to  the  commu¬ 
nication  costs  of  the  channels. 

(2)  Create  artifical  file  nodes  for  all  files  accessed  by  the  query,  and 
create  artifical  links  of  zero  weight  connecting  each  file  node  and 
its  copy  loctions.  These  artifical  links  should  be  directed  outwards 
from  the  file  nodes. 

(.3)  Find  the  shortest  directed  paths  from  all  file  nodes  to  the  requesting 

node.  These  shortest  paths  correspond  to  the  paths  taken  for  each 

individual  file  to  reach  the  requesting  node. 

Note  that  the  MDT  Algorithm  is  the  same  for  both  redundant  and 

non-redundant  databases.  In  addition,  since  we  are  only  interested  in 

finding  the  shortest  path  for  a  file  to  reach  the  requesting  node,  the 

restriction  that  all  file  processing  must  be  performed  at  the  node  set  N 

can  be  removed.  (Recall  that  N  consists  of  the  result  node  and  the  nodes 

containing  copies  of  the  files  accessed  by  the  query) . 

An  example  of  the  MDT  Algorithm  is  shown  in  Fiq.  5.1(a),  in  which 

-es 

a  query  oriqinatinq  at  node  1  access*  files  X,Y  and  Z  at  nodes  2,3  and  4 
respectively.  The  shortest  paths  from  nodes  2,3  and  4  to  node  1  are  as 
shown  in  Fig.  5.1(d).  These  are  the  paths  taken  by  the  files  to  reach 
the  result  node.  The  MDT  Algorithm  is  based  on  the  following  theorem. 
Theorem  5 , 2  :  Under  the  assumptions  of  the  MDT  Algorithm,  the  MDT 
Algorithm  will  minimize  the  response  time. 

Proof:  Consider  a  query  accessing  files  l,2,...n.  The  response  time,  by 

definition,  is  max.  F.,  where  F  =  time  it  takes  a  file  i  to  reach  the 
ISiln  i  i 

requesting  node.  Therefore,  to  minimize  the  response  time,  we  have  to 

min. (  max  P . ) .  The  MDT  Algorithm  accomplishes  this  by  minimizing  each 
i  1 

F .  ,  1  £  i  £  n . 

l 

n.E.D. 
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In  general,  solutions  that  minimize  response  times  are  not  unique. 

So  long  us  the  path  taken  by  the  file  corresponding  to  the  maximum 
communication  delay  is  unchanged,  the  paths  taken  by  the  other  files  can 
be  varied  without  affecting  the  response  time. 

5.4  Comparison  of  Query  Processing  Algorithms 

Wonq ' s  algorithm  attempts  to  solve  the  most  general  query  processing 

problem,  where  (1)  files  have  different  sizes,  and  selectivity  parameters 

are  arbitrary  and  (2)  the  communication  subnetwork  is  completely  general. 

Unfortunately,  this  general  problem  is  very  difficult  and  Wong's  attempt 

only  resulted  in  a  heuristic  aiving  local  optimum. 

Hevner  and  Yao  looked  at  a  simpler  problem,  relaxing  requirement ( 2) 

by  assuming  that  the  communication  cost  between  each  pair  of  nodes  depends 

linearly  on  the  volume  of  data  moved. 

The  MST  Algorithm  and  MDT  Algorithm  described  in  this  thesis  relax 

requirement  (1)  by  assuming  same  size  files  and  selectivity  parameters 

of  value  one.  These  algorithms  are  easy  to  implement  and  to  analyze. 

Note  that  the  MST  Algorithm  is  significantly  different  from  previous 

work.  That  it  is  different  from  Hevner  and  Yao’s  algorithm  is  obvious, 

but  is  it  also  different  from  Wong's  Algorithm  if  we  relax  requirement ( 1 ) 

in  implementing  Wong’s  algorithm?  The  answer  is  yes.  Intuitively,  Wong's 

a 

algorithm  does  not  guarantee^qlobal  optimum  which  the  MST  Algorithm  does, 
so  they  cannot  be  the  same.  Fig.  5.4  contains  a  simple  counterexample, 
in  which  a  query  tries  to  access  files  X,Y  and  Z.  Wong's  Algorithm  says 
that  the  first  proposed  solution  is  to  send  each  file  directly  to  the 
result  node  (Step  1  in  Fig. 5. 4).  We  then  try  to  replace  this  set  of  moves 
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by  two  sets  of  moves  that  are  less  costly  than  the  present  set  of  moves. 

In  this  case,  we  see  that  it  is  better  to  move  file  Y  from  node  3  to  node  2 
first,  to  be  processed  with  file  X,  and  then  send  the  result  to  node  1. 

The  move  of  file  Z  from  node  4  to  node  1  remains  unchanqed  (step  2  in  Fig. 5. 4) . 
At  this  point,  no  improvement  can  be  made.  Wong's  Algorithm  requires  a 
cost  of  5  units,  while  the  MST  Algorithm  only  costs  4  units. 

The  MST  Algorithm  and  the  MDT  Algorithm  also  enjoy  the  following 
advantages : 

(1)  We  retain  the  traditional  layering  approach,  i.e.  separating  the 
database  from  the  underlying  communication  subnetwork.  The  latter 
will  provide  the  input  for  our  algorithms:  the  link  costs  in  the 
MST  and  the  MDT  Algorithms. 

(2)  The  minimum  spanning  tree  problem  and  the  shortest  path  problem  in 
a  directed  graph  are  well  understood  and  there  exists  efficient 
algorithms  for  their  solutions. 

(3)  Under  the  assumptions  we  make,  the  MST  and  the  MDT  Algorithm  will 
solve  the  problem  optimally.  Other  query  processing  algorithms,  for 
example,  Wong's  algorithm,  only  achieve  a  local  optimum.  In  addition, 
Wonq's  algorithm  only  guarantees  that  the  solution  to  the  query  will 
be  available  at  one  site,  and  is  indifferent  as  to  which  site  it  is. 

The  two  algorithms  proposed  in  this  thesis,  on  the  other  hand, 
guarantee  that  the  result  will  be  at  the  requesting  node. 

»4)  The  MST  and  MDT  algorithms  can  be  easily  generalized  to  accomodate 

t edundant  file  copies.  While  Wong's  Algorithm  can  also  be  generalized 
using  the  same  technique,  Hevner  and  Yao's  Algorithm  cannot  be  general  - 
i  zed  i  is  i  1  y . 

(  r0  Th“  'wo  algorithms  proposed  are  easy  to  analyze. 
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CHAPTER  6 
CONFLICT  MODELS 


In  this  chapter  we  shall  describe  in  detail  how  to  model  the  conflicts 
between  transactions  under  various  concurrency  control  algorithms.  In 
particular,  we  shall  calculate  the  probability  of  conflicts  and  the  delay 
due  to  conflicts  for  Two-phase  Locking  Algorithms  and  for  Timestamp 
Ordering  Algorithms.  The  performance  of  locking  algorithms  depends  very 
much  on  the  algorithm  used  to  solve  the  deadlock  problem.  In  section 
6.2.1,  we  analyzed  the  Ordered  Queues  Algorithm  for  deadlock  prevention. 

In  section  6.2.2,  we  analyzed  the  Prioritized  Transactions  Algorithm. 

In  section  6.2.3,  we  calculated  the  probability  of  deadlocks  when  trans¬ 
actions  are  allowed  to  wait  for  each  other  in  an  uncontrolled  manner,  which 
is  the  case  under  Deadlock  Detection.  Section  6.3  is  devoted  to  SDD-1,  a 
timestamp  ordering  algorithm. 

6. 1  Model  Assumptions 

The  following  assumptions  are  made  in  this  chapter: 

(])  transaction  arrivals  are  Poison  and  divided  into  transaction  classes 
defined  by  readsets  and  writesets. 

(2)  topology  of  network  and  location  of  copies  of  files  are  given. 

(3)  there  are  two  message  types,  a  short  type  with  mean  1/p^  such  as  lock 
requests,  requests  to  read  files,  etc;  and  a  long  message  type  with 
mean  l/u_,  such  as  file  transfers,  pre-commits,  etc.  Both  types  of 
messages  are  assumed  to  have  exponentially  distributed  lengths.  Suppose 
the  short  and  long  messages  constitute  a  fraction  y  and  y7  of  the 


total  number  of  messages.  We  also  make  the  approximation  that  the 
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longth  of  all  messages  is  still  exponential  with  mean 


1,  i'?, 

>  4  Y  1J  +p 

1  2  1  2 


(-1)  Transactions  will  be  processed  according  to  the  Transaction  Processing 
which 

Model^consists  of  two  steps:  a  query  processing  step  and  a  write  step. 

In  the  query  processing  step,  the  MST1  Algorithm  will  be  used  to 
produce  the  result  of  the  query  at  the  request  node.  In  the  write 
step,  the  request  node  will  initiate  the  two-phase  commit  algorithm  to 
all  nodes  containing  a  copy  of  the  files  in  the  writeset. 

(5)  Approximate  the  end-to-end  transmission  delay  of  the  communication 
subnetwork  as  exponential. 

(6)  One  important  parameter  in  locking  algorithms  is  the  locking  granularity 
i.e.  the  size  of  the  unit  of  the  database  which  is  individually  locked. 
Using  simulation  models,  Ries  and  Stonebraker  [RS77]  showed  that  under 

a  wide  variety  of  conditions  ,  coarse  granularity  gives  shorter  response 
times.  Therefore,  in  cur  performance  models,  coarse  granularity  is 
assumed.  In  particular,  the  numerical  examples  in  Chapter  7  assume 
that  whole  files  are  individually  locked.  This  not  only  simplifies 
the  performance  model,  but  also  the  data  collection  necessary  to 
execute  the  model. 

(7)  Requests  for  locks  are  served  in  a  FCFS  manner  and  the  capacity  of 
the  queue  is  infinite. 

(8)  Two  transactions  conflict  when  they  try  to  access  the  same  data  item 
and  at  least  one  of  them  is  a  write  request. 


6. 2  Two-Phase  Locking 

When  two  transactions  conflict  under  the  locking  algorithm,  one  of 
them  is  made  to  wait  until  the  other  releases  its  locks.  This  waiting 
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ineurs  do  lay  on  the  transaction  and  can  be  modelled  as  a  queue.  Consider, 
for  example,  a  file  X  with  redundant  copies  X^,  X,  and  X ; .  In  the  Central  iced 
Locking  ALgorithm,  there  is  one  lock  manager  for  file  X  and  it  is  located 
at  the  central  node.  All  transactions  trying  to  access  file  X  must  rt-uuest 
a  lock  on  X  form  this  lock  manager.  In  the  Distributed  Locking  Algorithm, 
there  will  be  three  lock  managers,  one  for  each  redundant  copy.  Each  of 
these  lock  managers  will  be  located  at  the  same  node  as  the  file  copy  it 
manages.  A  transact  ion  accessing  X.,  i  =  1,2,3,  will  request  a  lock  on 

X ,  from  the  lock  manager  of  X..  Other  files  in  the  database  will  be  managed 

i  l 

by  their  respective  lock  managers,  and  each  lock  manager  can  be  modelled 
as  a  queue .  The  service  time  at  each  queue  corresponds  to  the  length,  of 
time  a  transaction  will  hold  a  lock  on  the  file. 

<■> . 2 . 1  Ordered  Queues  for  Deadlock  Prevention 

One  way  to  prevent  deadlocks  is  to  require  transactions  to  request 
locks  in  some  universally  specified  order,  i.e.  wait  for  file  X  first,  then 

Y,  then  Z,  etc.  Therefore,  the  transaction  has  to  obtain  service  at  a 
network  of  queues.  (See  Fiq.  P.l).  The  arrival  rate  of  the  different 
transaction  classes  to  the  lock  managers  will  let  us  calculate  the  external 
arrival  rates  and  the  routing  probabilities  for  this  network  of  queues.  For 
example,  consider  two  transaction  classes  with  arrival  rates  \  and  X  ^ . 

The  first  class  of  transaction  accesses  both  files  X  and  Y,  while  the  second 
class  accesses  files  X  and  Z.  The  total  external  arrival  rate  to  queue  X 
is  \  +  >  ,.  After  obtaining  service  at  queue  X,  with  probability  \  j  •'  1  \  j  +  ,  1  , 

the  request  will  go  to  lueuo  Y,  and  with  probability  \  /(X  +\.^)  ,  it  will 
go  to  queue  2. 


-fc _ 
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- ►  EXTERNAL  ARRIVALS 

- ►  ROUTING  PROBABILITIES 


Fi.-jun  (:.l  Ordered  Queues  for  Deadlock  Prevention  Modelled 
a.s  a  Queueing  Network 
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The  service  time  at  each  of  the  queue  in  this  network  of  queues,  say 
queue  X,  corresponds  to  the  time  a  transaction  i  accessing  file  X  will 
hold  the  lock  on  X.  This  service  time  is  the  sum  of  : 

(1)  transmission  delay  from  lock  manager  to  request  node  (lock  grant), 

(2)  distributed  processing,  > 

(3)  transmission  delay  from  request  node  to  lock  manager  (lock  release) . 

The  average  service  time  must  be  weighted  by  the  arrival  rate  of 
the  different  transaction  classes.  Thus  if  the  service  time  due  to  class  1 
transactions  T^  is  and  that  due  to  class  2  transaction  T^  is  ,  then 
the  average  service  time  will  be  (^  ^  /  ^i+^2  ^  w*lere  are  t^ie 

arrival  rates  of  T^  and  T^  respectively. 

Our  problem  is  complicated  by  the  fact  that  a  transaction  will  hold 
all  locks  granted  until  it  has  finished  service.  This  means  that  while  it 
is  waiting  for  lock  Y ,  for  example,  while  already  holding  lock  X,  all  othr r 
transactions  waitinq  for  lock  X  will  be  blocked. 

We  now  make  the  additional  approximation  that  not  only  do  transactions 
have  to  request  locks  in  a  specific  order,  they  also  have  to  get  served  in 
that  order.  For  example,  if  transaction  i  wants  to  read  X  and  Y,  it  is 
required  to  get  a  lock  on  file  X,  after  which  it  will  read  X.  Then  it  will 
get  a  lock  on  file  Y,  and  then  read  Y.  This  is  an  approximation  because 
in  a  real  database  system,  in  order  to  permit  more  concurrency,  as  soon  as 
transaction  i  gets  the  lock  on  X,  it  will  often  start  queueing  for  the  lock 
on  Y. 

This  approximation  is  necessary  to  simplify  the  model.  If  we  further 
assume  that  lock  requests  arrive  in  a  Poisson  manner  and  service  time  for 
each  lock  request  is  exponential,  the  Ordered  Queues  Algorithm  can  be  approx i- 
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mated  as  a  Markov  Chain.  Consider,  for  example,  a  two  file  system  shown 
in  Fig.  6.2(a). 

There  are  three  possible  kinds  of  lock  requests: 

(1)  lock  X  then  Y:  rate  X^ 

(2)  lock  X  only  :  rate  X^ 

(3)  lock  Y  only  :  rate  X^ 

Let  (m  n)  denote  a  state  of  the  system,  where  n  is  the  number  of  lock 
requests  both  in  queue  and  in  service  at  Y,  and  m  is  the  number  of  lock 
requests  at  X;  m  =  b^  indicates  that  lock  X  is  blocked  and  that  there  is 
i  requests  waiting  in  queue. 

The  service  rates  at  X  and  Y  are  and  y ^  respectively. 

The  state  transition  diagram  is  shown  in  Fig.  6.2(b).  Consider  the 
state  (1  0),  i.e.  one  request  at  queue  X,  queue  Y  empty.  If  a  type  1  (rate  X^) 
or  type  2  (rate  X^)  request  arrives,  with  rate  X^+X^,  we  9et  to  t*le  state 
(2  0).  If  a  type  3  request  arrives,  we  get  to  state  (1  1).  If  the  request 
at  queue  X  is  a  type  1  request,  after  it  get  served  at  queue  X,  it  will 
go  to  queue  Y,  while  still  holding  the  lock  on  X,  i.e.  we  get  to  the  state 
(bQ  1).  Thus,  even  for  two  files,  the  model  becomes  very  complex. 

An  alternative  approach  is  to  approximate  the  network  of  queues  by  a 
series  of  M/G/l  queues,  each  corresponding  to  the  transactions  waiting  for 
the  lock  on  one  particular  file. 

Consider,  for  example,  a  three-file  system  with  lock  request  rates 
Ur  (X,  Y)  ]  ,  [X,,  (Y ,  Z)  ]  ,  [X  ,  (X,Z)]  and'[X  ,  (Z)].  (See.  Fig. 6.1). 

Let  b._, ,  by,  by  be  the  in-service  time  for  locking  files  X,  Y,  Z 
individually,  i.e.  not  including  delay  due  to  blocking.  Let  a  ,  a  ,  a  be 
the  in-service  time  for  locking  files  X,  Y,  Z  when  the  files  must  be  locked 
in  tiie  order  x  ‘  Y  ->■  Z,  i.e.  including  delay  due  to  blocking.  Let  W  ,  W  , 


'JL 

1 
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ROUTING  PROBABILITIES 


(a)  Two  file  queueing  network 
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W  denote  the  queueinq  time  and  S  ,  S.,,  S  denote  the  total  service  time. 
7.  x  Y  z 

i.e.  queueing  plus  in-service  time,  corresponding  to  the  in-service  time 

of  a  ,  a  ,  a  respectively. 

X  Y  Z 

a  =  b  since  requests  accessing  file  Z  will  not  be  blocked. 

Z  z 


aY  = 


|  by  with  probability 

|  by  +  Sz  with  probability  A^fA^+A^ 


since  a  fraction  X^/ (.X^+X^)  of  the  lock  requests  at  queue  Y  will  release 


its  locks  on  Y  only  after  they  get  served  at  queue  Z. 

jbx  +  Sy  with  prob.  A1/(A1+A.) 
Similarly,  ax  =  ! 

(bx  +  Sz  with  prob.  A^U^A^ 


ip  *  rp  rp 

We  can  find  f  (s)  ,  f  (s) ,  f  (s)  by  using  the  Pollaczek-Khinchin 
X  SY  SZ 

transform  equation [KLEI75] , i .  e . 


T  T 

fc(s)  =  fUs)  — 
S„  a„ 

Z  Z  s 


5(1  -  ez} 


xz  +  xz  Vs> 
z 


(6.1) 


Sd  -  Py) 


T  T 

VS)  =  f,(S)  '  T 

Y  Y  s  -  Ay  +  Ay  fa(s) 


where  fT(s)  =  f^(s)  A  /(A  +A  )  +  f^(s)  f^(s)  A  /(A  +A  ) 

aY  by  1  1  2  by  Sz  2  1  2 


T  T 

C(s)  =  f  (s)  — 

Sx  aX  s 


sd  -  px) 


XX  +  AX  fa^S) 


where  fT(s)  =  f^(s)  f^(s)  A  / (A.+A  )  +  f^(s)  f^(s)  A_/(A  +A  ), 
aX  bX  SY  1  1  3  bX  SZ  3  1  3 

and  A  =  A  +  A  ,  A  =  A.  +  A  ,  A  =  A-  +  A  +  A  ,  p  =  A  a  , 

X  1  JY  1  Z  Z  c.  j  4  X  X  A 


p y  =  XY  ay'  Pz  =  AZ  V 


I 
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After  finding  the  s-transforms,  which  must  be  solved  in  the  order 
T  T  T 

f  (s) ,  f..(s),  f  (s)  ,  we  can  then  find  the  expected  total  service  time  of 

z  ;’y  Jx 

any  transaction.  For  example,  the  transaction  that  acesses  files  X  and  Y 

has  an  average  service  time  of  S  +  S  . 

x  y 

Note  that  the  strategy  adopted  to  handle  the  lock  requests  from  different 
transactions,  say  (X,Z)  and  (Z)  transactions,  are  important.  For  example, 
we  may  give  priority  to  (X,Z)  transactions  at  queue  Z,  since  the  (X,Z) 
transactions  hold  up  more  resources.  The  model  we  have  developed  assumes 
that  all  requests  at  queue  Z  are  handled  in  a  FCFS  manner.  Mathematical 
models  for  other  queueing  disciplines  can  be  developed,  although  they  will 
probably  be  more  complex. 

6.2.2  Prioritized  Transactions  for  Deadlock  Prevention 

We  shall  now  analyze  the  wait-die  and  the  wound-wait  system  for  deadlock 
prevention  described  in  2.4.2. 

Suppose  two  conflicting  transaction  classes  i  and  j  arrive  at  sites  i 
and  i  with  Poisson  rate  A.  and  A  respectively,  and  try  to  obtain  a  lock 
on  the  same  data  item  X  at  site  c.  (See  Fig. 6. 3).  We  would  like  to  find 
the  probability  of  transaction  restarts  and  the  delays  associated  with  them. 
Wait -die 

Consider  a  transaction  T.  in  transaction  class  i.  T.  will  be  restarted 

l  l 

if  it  tries  to  wait  for  a  conflicting  transaction  which  has  higher  priority. 

In  other  words,  T.  will  be  restarted  if  a  conflicting  transaction  T.  with 
1  3 

smaller  timestamp  (and  hence  higher  priority)  than  that  of  T^  arrives  at 

site  c  before  T.  does,  and  is  still  in  service  when  T.  arrives.  This 
i  l 

scenario  is  depicted  in  Fiq.  6.4(a).  T  is  the  transmission  delay  between 

ic 


=  transmission  delay  frc 


Figure  6.3  Conflicting  Transaction 
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site  i  and  site  c,  and  is  exponentially  distributed,  S.  is  the  service  time 

or  T  at  site  c.  This  includes  the  queueing  time  for  data  item  X  plus  the 

t  ime  T  holds  the  lock  on  X.  Point  A  represents  the  time  at  which  the  most 

recent  transaction  T_.  arrives  at  node  j.  AB  therefore  represents  the  length 

of  time  one  has  to  qo  backwards  in  time  until  one  sees  T..  Since  T.  arrives 

1  1 

in  a  Poisson  manner,  AB  is  exponentially  distributed  with  mean  1/X_.. 

Hence,  P_  =  P(T.  is  restarted  by  T.  at  node  c) 

R  l  1 

=  P(t.  <  a.  +  t.  <  t.  +  S.)  (6.2) 

3 c  3  ic  ic  1 

fciqu.  (6.2)  can  be  evaluated  given  a  distribution  for  S j .  To  simplify  the 
mathematics,  we  further  assume  that  S.  is  exponent'’ ally  distributed  with 
mean  1/s  .  . 


There  are  two  cases  to  consider:  (1)  a.  <  t.  (See  Fig. 6. 4(a)), 

2  3C 

(2)  a.  >  t.  (See  Fig. 6. 4(b)). 

1  )<-' 

Now  P(t.  <  a .  +  t .  <  t .  +  S .  la.  <  t .  ) 

ic  j  1C  JC  1  1  1  )C 

P  ( t  .  -  a  .  <  t  .  <  t .  -a.+S.  la.  <t.  ) 

ic  ]  ic  ic  i  3  1  ic 


=  P(t.  <t.  <t.  +S.)  since  (t.  -a.  a.  <  t.  )  t  t 


ic 


ic 


1 


1C  1 


1C 


1C 


=  P(t.  >  t  .  )  P  (S  .  >  t .  -t. 

1C  1C  1  1C  ic 


t.  >  t .  ) 
ic  jc 


=  P(t.  >  t.  )P(S.  >  t.  )  since  (t.  -  t. 

ic  jc  1  1C  ic  jc 


t .  >  t .  )  t  t . 

1C  jc  1C 


1C 


Uic; 


P  .  +il  .  P  .  +S  . 

1C  1C  1C  1 

and ,  P(t.  <  a .  +  t  <  t .  +  S .  I  a .  >  t .  ) 

ic  i  ic:  i<:  11  ic 

--  P(0  <  a.  -  t  .  +  t.  <  S.  |  a  >  t.  ) 

1  ic  ic  1  1  ic 


\ 


P  (0  <  a  .  +  t.  <  S  )  since  (a.  -  t.  a .  >  t  .  )  'v  a . 

1  ic  l  x  1  ic  i  ic  i 

i  ic 

T(S.  '  a.)P(S.  >  t .  )  =  - — j- - — — 

1  1  ic  \  .  +  s .  u .  +  s  . 

1  1C  1 


x  y  means  random  variables  x  and  y  have  the  same  distribution. 
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X  . 


Hence,  P(T,  is  restarted)  = 
1 


U  .  VI  .  u .  X  . 

pc _ ic  +  pc  _ j_ 


X ,+u .  u .  +y  .  u •  +s .  X.+u.  X.+s.  u.  +s. 

3  DC  ic  dc  ic  3  D  pc  j  j  Mic  j 


X  .  U .  U  .  , 

_2 - ic_2£  ( - 1 —  +  — 1 — )  (6  3, 


X  ,+y  .  y  .  +s .  y •  +y •  X.+s. 

D  dc  ic  d  ic  DC  d  D 


Note  that  we  have  simplified  the  problem,  for  even  if  T.  is  not  restarted 


by  the  most  recent  conflicting  transaction  ,  it  may  be  possible  that  it 

is  restarted  by  T . 1 , the  transaction  that  arrives  at  node  j  before  T.(see 
D  D 

Fiq.6.5)  or  other  previous  transactions  arriving  at  node  j.  The  probability 

of  these  rejections  can  be  calculated  in  a  similar  manner.  However,  since 

these  probabilities  are  usually  very  small,  we  make  the  assumption  that 

P (T^  restarted  by  T_.  '  or  previous  conflicting  transactions  |  not  restarted 

by  T. )  =0. 

D 


When  a  transaction  is  restarted,  a  lock  reject  message  is  sent  to 
the  request  noa  which  must  then  terminate  the  current  transaction  (in 
a  distributed  locking  algorithm,  this  means  sending  abort  messages  to  all 
nodes  where  it  has  requested  locks)  and  resubmit  a  new  lock  request.  Since 
the  resubmitted  request  will  retain  the  original  timestamp,  it  is  possible 
that  the  new  request  will  conflict  with  the  transaction  that  kills  it  before 
and  be  killed  again  (See  Fig.  6.6(a)). 

Hence,  P  =  P  (rejection  again) 

RA 

-  P (round  trip  delay  to  request  node  plus  abort  time 
<  remaining  service  time  of  T  ) 

=  P(t  +  t.  +AT<S.)  since  S.  is  exponential, 
ci  ic  D  D 

AT,  the  abort  time,  corresponds  to  the  delay  associated  with  sending  abort 
messaqes  to  all  nodes  where  T^  has  requested  locks  and  waiting  for  their 
acknowledgements.  In  particular,  if  AT  is  exponential  with  mean  l/ua>  then 

t.  ii  .  ii 


P 


HA 


ci  ic  a 


+s  .  U  .  +s  .  U  +s  . 
ci  d  ic  i  a  j 


(6.4) 
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T. arrives  at  i 


T.  arrives  at  C 
and  is  rejected 


Resubmitted  T.  arrives 
round  trip  delay  1 

to  node  i  _ and  15  reiected  again 

+  abort  time 


(b)  Expected  number  of  rejections 


Figure  6.6  Finding  the  Expected  Number  of  Rejections 
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T'tu-  probability  of  this  happening  yet  again  is  the  same  because  of 
the  self-renewing  property  of  the  exponential  distribution. 


For  each  transaction  T\ , 


E (number  of  rejections)  =  (1  -  P  )  .  0  +  P  (1  -  P  )  £  iP  1  1  (See  Fig.6.6(b)). 


RA  r  RA 
1=1 


■  V1  -  V 


a  -  V 


‘  V(1  -  PRi> 


16.5) 


The  additional  delay  due  to  each  rejection  =  t  .  +  t.  +  AT. 

Cl  1C 

Wound-wait 

Consider  a  transaction  T_. ,  it  will  be  wounded  by  a  conflicting  transaction 
T^  if  T^  has  higher  priority  and  tries  to  wait  for  T_. .  Fig.  6.7  shows  what 


happens . 

Therefore,  P(T.  is  wounded)  =  P„ 
3  W 


=  Pdb  is  wounded  by  T^,  the  most  recent 

conflicting  transaction  having  timestamp 

earlier  than  T.) 

1 

=  P(  a.  +  t.  <  t.  <  a.  +  t.  +  S.) 

3  ic  ic  l  gc  g 

=  P(  a.  +  t.  <  t.  )  . 

i  gc  ic 


P(t.  <  a.  +  t.  +  S.  |  a.  +  t.  <  t.  ) 
ic  i  jc  ]  1  i  gc  ic 


=  P(  a.  +  t.  <  t.  )  . 
i  jc  ic 


P(t.  -  a.  -t.  <  S.  |  a.  +  t.  <  t.  ) 
ic  1  JC  j  1  1  jc  1C 


1 . 

1 


1 .  +11  .  U  .  +U  .  U  .  +s  .  , 

l  ic  jc  ic  ic  j  (6.6) 

Note  that  we  are  again  ignoring  the  possibility  that  may  be  wounded 
by  previous  transactions,  i.e.  those  having  earlier  timestamps  than  T. .  Formally, 


we  are  assuming  that  F(T.  wounded  by  previous  transactions  T.  not  wounded  by  T.) 

;  3  i 
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T.  arrives  at  j 

i  * 


.  T.  arrives  at  C  and  enters  queue 
jC  -■*  1 


S. 


! 


T.  releases  lock 


t. 


iC 


-W.  =  wasted  service  time  due 


I 


to  T.  being  wounded 


T.  arrives 
1  at  i 


T.  arrives  at  C 


uro  6.7  Pro'..  :.bilitv  T.  is  wounded  by  T  under  Wound-wait 

7  i 
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Now,  we  n.aks  the  additional  assumption  that  all  wounded  transactions 
are  eventually  restarted,  i.e.  when  a  wound  message  arrives,  the  transaction 
has  not  reached  the  commit  phase  of  the  two-phase  commit.  In  this  case, 
the  wounded  transaction  will  be  resubmitted,  with  the  original  timestamp. 

In  wound-wait,  in  contrast  to  wait-die,  even  if  the  resubmitted  transaction 
conflicts  with  the  transaction  that  wounds  it,  it  will  not  be  wounded  again. 
This  is  because  the  resubmitted  transaction  is  now  the  requestor  and,  having 
lower  priority,  it  is  allowed  to  wait. 

For  each  transaction  T.,  E(number  of  restarts)  =  P  (See  Equ.(6.6)). 

3  " 

The  additional  delay  due  to  each  wound  (See  Fig. 6. 7) 


=  t  .  +  t.  +  W. 
c:  }c  j 

=  t  .  +  t.  +  (t. 
C}  ]C  ic 


a.  -  t . 
l  ]c 


=  t  .  +  t .  +  min (t.  ,  S  . ) 

c]  jc  ic  3 


a.  +  t.  <  t.  <  a.  +  t.  +  S.) 
l  jc  ic  l  jc  j 

(6.7) 


Given  t.  -a.  -  t.  >0,  t.  -a.  -  t.  has  the  same  distribution  as 
ic  1  jc  ic  l  jc 

t.  .  W.  is  thus  an  exponential  restricted  to  be  less  than  S.,  another 
ic  3  ^  3 

exponential.  The  derivation  in  Appendix  I  shows  that  W_.  has  the  same 

distribution  as  min(t.  ,  S.). 

ic  3 

Note  that  in  using  prioritized  transaction  for  deadlock  detection,  a 
transaction  can  start  queueing  for  all  files  that  it  wants  to  access  simul¬ 
taneously,  in  contrast  to  the  case  of  ordered  queues  where  it  has  to  wait 
for  file  X  first,  then  Y,  then  Z,  etc.  Suppose  a  transaction  has  to  lock 
both  files  X  and  Y,  then  the  time  it  has  to  wait  until  both  locks  are 


granted,  provided  the  lock  requests  are  not  rejected,  is  given  by  D  =  max. (wx»wy) 
where  wx,  wy  are  the  queueing  time  at  queues  X  and  Y  respectively. 

In  particular,  if  the  service  time  (not  including  queueing)  at  the  queues 


I 


-96- 


are  exponential  with  means  1/y  and  1/y 

X  Y 

,  ,  PX  PY  PXPY 

to  be  - — - —  +  - — - ; - — - r - 

yv(l-pv)  Mv(l-Pv)  u„(l-P„)  +  y„ 

A  A  Y  Y  XX  Y 

px  =  VUX'  and  PY  =  ^ y/ Ir¬ 


respectively ,  then  E(D)  is  found 


U-Py) 


in  Appendix  II,  where 


Similarly, if  a  transaction  has  to  request  locks  on  files  W,  X, _ Z, 

then  D  =  max. (Ww,wx> . . .w^)  and  again  E(D)  is  given  by  an  expression  derived 
in  Appendix  II. 


6.2.3  Probability  of  Deadlocks 

Another  way  to  solve  the  deadlock  problem  is  deadlock  detection.  As 
is  mentioned  previously  (section  2.4.1),  this  is  practical  only  for  Centralized 
Locking  Algorithms.  Periodically,  the  deadlock  detector,  which  is  located 
at  the  central  node,  will  construct  the  waits-for  graph  and  determine  if 
there  are  any  deadlocks.  When  a  deadlock  is  detected,  one  of  the  transactions 
is  restarted  to  break  the  deadlock. 

Therefore,  one  important  parameter  in  our  conflict  model  is  the  probability 
of  deadlocks.  For  each  transaction,  we  must  find  (1)  the  probability  that 
it  will  be  involved  in  a  deadlock  with  other  transactions,  and  (2)  the  expected 
delay  due  to  this  deadlock. 

In  the  following  analysis,  we  shall  consider  deadlocks  involving  only 
two  transactions.  In  addition,  we  make  the  following  assumptions: 

(1)  The  Transaction  Processing  Model  says  that  transactions  will  be  processed 
in  two  steps:  a  query  processing  step,  and  a  write  step.  Thus,  when 
a  transaction  T  arrives  at  the  central  node,  it  will  request  locks  on 
all  files  in  its  readset.  It  is  assumed  that  T  starts  queueing  at  all 


files  in  its  readset  simultaneously.  T  next  performs  query  processing 


-97- 


after  which  it  will  request  locks  on  all  files  in  its  writeset.  Again 
T  will  start  queueing  at  all  files  in  its  writeset  simultaneously. 

(2)  The  elapsed  time  between  when  T  requests  locks  on  its  readset  and 
writeset  is  assumed  to  be  exponentially  distributed. 

Consider  Fig.  6.8(a)  in  which  two  classes  of  transactions  try  to  obtain 
locks  from  the  central  node.  Class  1  transactions  have  a  readset  consisting 
of  file  X  and  a  writeset  consisting  of  file  V  while  class  2  transactions 
have  file  Y  as  the  readset  and  file  X  as  the  writeset,  thereby  creating  a 
potential  deadlock.  There  are  two  cases  to  consider:  (1)  Class  1  transaction 
arrives  at  the  central  node  first,  and  (2)  Class  2  transaction  arrives  first. 

Suppose  a  Class  1  transaction  T^  arrives  first  (Fig.  6.8(b)),  with 
probability  X^/ (X^+X^ .  According  to  the  Transaction  Processing  Model, 
when  T^  arrives  at  the  central  node,  it  will  request  to  lock  its  readset, 
namely  file  X.Upon  obtaining  the  lock  on  X,  it  will  perform  query  processing, 
which  in  this  case  corresponds  to  reading  file  X.  Then  it  will  request  a 
lock  on  file  Y,  its  writeset.  The  time  between  when  T^  requests  locks  on 
X  and  Y  is  represented  by  AB  in  Fig.6.8(b). 

Any  Class  2  transaction  arriving  after  time  A  will  try  to  access 
file  Y  and  must  wait  until  T^  is  completed.  In  addition,  if  T2  arrives  at 
time  C  before  T^  requests  the  lock  on  file  Y,  i.e.  during  the  period  represented 
by  AB  in  Fig. 6. 8(b),  then  when  wants  to  access  file  Y,  it  must  wait  for 
T2 .  A  deadlock  is  created  and  the  probability  of  deadlock,  P  ,  is  given  by: 

PDL  =  P(AC  <  AB)  "  *2/<*2'Hll)  (6*8) 

where  1/y ^  =  E (AB)  and  AB  is  exponentially  distributed. 

The  deadlock  detector  at  the  central  node  constructs  the  waits-for 
graph  periodically  and  takes  time  BD  (Fig.  6.8(b))  to  detect  the  deadlock. 
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(a)  Two  conflicting  classes  of  transactions  arrive  at  central  node 


T  1  arrives  and 


reque 

sts  lock  on  X  1 

time  between  request  for 
locks  on  readsets  and  — m 
writesets 

■j  requests  lock  on  Y 

A 

interarrival 

time 

C 

deac 

crea 

B  |d 

Hock  is  deadlock  is 

ited  detected 

T2  arrives  and 
*  requests  lock 


(b)  A  deadlock  is  created 


Figure  6.8  Probability  of  Deadlocks 
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If  the  waits- for  graph  is  constructed  every  S  seconds,  say,  then  E (BD) 

=  S/2  seconds.  After  the  deadlock  is  detected,  the  deadlock  detector  will 
break  the  deadlock  by  restarting  one  of  the  deadlocked  transactions.  In 
this  case,  in  order  to  minimize  wasted  resources,  it  will  restart  since 
T2  has  barely  started  while  has  already  finished  its  query  processing 
step.  Therefore,  given  T^  arrives  first,  expected  delay  for  due  to 
deadlock  =  E (BD)  =  S/2,  while  expected  delay  for  T2  due  to  deadlock 

E(CB)  +  E (BD)  =  E(AB  -  AC  |  AB  >  AC,  +  E (BD)  =  E(AB)  +  E (BD)  =  1/U  +  S/2. 

The  symmetric  situation  of  T2  arriving  at  the  central  node  first 
(with  probability  *2^Xl+X2*  is  completely  analogous.  Thus,  given  T2  arrives 
first,  P (deadlock)  =  where  1/P2  is  the  expected  elaped  time 

between  when  T2  requests  locks  on  its  readset  and  its  writeset.  For  T^, 

E (delay  due  to  deadlock)  =  S/2. 


Hence,  P (deadlock  between  T^  and  T2> 

=  P (deadlock  |  T^  arrives  first)  P(T^  arrives  first) 

+  P (deadlock  j  T2  arrives  first)  P(T2  arrives  first) 


X2+Ul  Xl+X2  \+V2  Xl+X2 


and  E (delay  due  to  deadlock  for  T^) 

=  E (delay  for  |  T^  arrives  first,  deadlock  occurs)  . 

P (T^  arrives  first,  deadlock  occurs) 


+  E (delay  for  T^  |  T^  arrives  first  ,  deadlock  occurs)  . 

P(T2  arrives  first,  deadlock  occurs) 

S  X  X  1  S  X  X 

= - - - -  +(—  +  -)  — - — 


(6  10) 
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Similar  ly,  E (delay  due  to  deadlock  for  Tj) 


=  s  X2  X1  |(  1_  S  X1  X2 

2  X^+X2  ^i+U2  2  X^+X2  ^2+^l 


(6.11) 


Note  that  under  the  assumptions  of  our  deadlock  model  listed  at  the 
beginning  of  this  section,  a  read  only  transaction  (i.e.  a  transaction  with 
an  empty  writeset)  will  not  enter  into  a  deadlock  with  another  transaction. 
Consider  two  transactions  and  ,  in  which  has  non-empty  readset  and 
writeset  and  has  empty  writeset.  Suppose  arrives  first  and  queues 
at  all  files  in  its  readset.  After  a  certain  time  arrives.  Since  two 
read  requests  do  not  conflict,  can  be  completed  without  having  to  wait 
for  to  complete.  Suppose  arrives  first.  Again  T 2  does  not  have  to 
wait  for  T^.  Therefore,  no  deadlock  is  possible. 

A  write  only  transaction  (i.e.  a  transaction  with  empty  readset)  may 
eneter  into  a  deadlock  with  another  transaction.  Consider  two  transactions 
and  in  which  has  readset  (X)  and  writeset  (Y)  and  T2  has  an  empty 
readset,  and  writeset  (X,Y).  Suppose  arrives  first  and  locks  X.  After 
a  time  T2  arrives,  it  joins  the  queue  for  file  X  and  locks  file  Y.  When 
performs  che  write  step,  it  cannot  access  file  Y  and  must  wait  for  T2> 

A  deadlock  is  thereby  created.  On  the  other  hand,  if  T2  arrives  first, 
then  it  does  not  have  to  wait  for  T^  and  no  deadlock  is  possible. 


6. 3  Timestamp  Ordering  (SDD-1) 

The  conflict  model  of  SDD-1  attempts  to  determine  two  important 

parameters  :  (I)p  ,  the  possibility  that  a  read  message  will  be  rejected 
ot 

because  of  an  obsolete  (or  reversed)  timestamp,  and  (2)  given  that  a  read 
message  is  not  rejected,  the  time  it  has  to  wait  before  the  read  condition 
is  satisfied  and  it  can  be  processed. 


1 
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6.3.1  Probability  of  Read  Rejection 

Consider  the  simplest  case  of  two  TM's  shown  in  Fig.  6.9.  Transactions 

i  arrive  at  TM  with  Poisson  rate  X  while  conflicting  transactions  i.  arrive 
a  «  a  & 

at  TM  with  rate  X  .  Assume  that,  upon  reaching  TM  and  TM0,  i  and  i 
p  p  a  Pa  6 

take  respectively  time  t  and  t  to  get  to  DM  .  These  times  include  both 

Cip  ct 

the  queueing  and  transmission  times  at  the  respective  channels.  Suppose 


we  choose  an  arbitrary  iQ,  and  consider  the  time  we  have  to  wait  until  we 

see  the  next  arrival  of  an  i  at  TM  .  Call  this  waiting  time  a  .  Due  to 

p  p  3 

the  memory less  property  of  Poisson  processes,  we  note  that  a  is  exponential 

6 

with  rate  X  .  An  inspection  of  Fig.  6.9(b)  gives  us  the  following  expression 

p 

for  pa: 


p  =  P(i  will  arrive  at  DM  later  than  i„  given  that  i  enters  the 

database  system  at  TM  before  i  enters  the  system  at  TM  ) 

a  B  3 

’  P(ta  >  *B  +  ‘s’ 

‘  PlaB  "  ‘a  -  ‘s’ 

Case  Is  Suppose  that  t  and  t„  are  constants, 

a  6 

6(ta  -  S’  ■ 

then  p  =  f 1  ~  e  lf  >  fcB 

) 

’  0  otherwise 

Case  2:  the  lengths  of  messages  ia  and  i^  are  exponentially  distributed  with 
mean  l/ua  and  1 

In  this  case,  the  analysis  of  Case  1  is  still  applicable,  except  that 


tQ  -  tp  is  not  a  constant  anymore. 


Recall  that  if  y  is  the  total  service  time  (queueing  plus  service)  for 
an  M/M/1  queue,  then 

-(u  -  X)y 

y0  -  0 


fy(y0)  =  (u  -  X)  e 


whereX  is  the  arrival  rate  and  y  is  the  service  rate.  Therefore  ta  and  t^ 
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(a)  Conflicting  Transactions  arriving  at  DM^ 


(b)  Example  of  Read  Rejection 


Figure  6.9  Probability  of  Read  Rejection 
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are  exponential  with  rates  y  -A  and  y  -A .respectively .  Moreover,  the  pdf 

Cl  (X  p  p 

of  T  =  tQ-  tg,  given  that  tQ  >  t^  will  be  given  by 


VV  =  {W  e 


-(y  -A  ) T 
a  a  o 


T  >  0 
o  — 


due  to  the  memoryless  property  of  Poisson  processes. 

Therefore,  pa  =  P(to  >  tg)  P^-tg  >  ag|ta>  tj}) 

VX8  XB 
WVXa  YVXa 

Case  3:  General  Case  -  More  than  two  TM's,  messages  with  exponential  lengths. 

Let  G  =  {0,y,...}  be  the  set  of  subscripts  of  those  TM's  that  send 

messages  to  DM^  which  conflict  with  message  i  .  Consider  the  TM  pairs 

(TM  ,  TM.) , (TM  ,  TM  )  , . . .  in  a  similar  fashion  to  the  analysis  in  Case  2. 
a  0  a  y 

The  probability  of  rejection  due  to  each  TM^ ,  geG,  can  be  found  as  above. 
Since  the  arrival  of  messages  at  the  TM's  are  independent. 


Pa  =  P(i  will  be  read  rejected) 

=  1  -  P(ia  will  not  be  read  rejected) 

=  1  -  II  P(ia  will  not  be  read  rejected  by  messages  from  TM  ) 
gtG  9 

V  -A  A 


=  i  -  n  [  i  - 


y  -A  +y  -A  A  +y  -A 
g  g  a  a  g  a  a 


6.3.2  Delay  due  to  conflicts 

What  is  the  read  delay  given  that  a  read  message  is  not  rejected? 
Case  Is  Two  TM's,  exponential  message  lengths. 

An  inspection  of  Fig.  6.10  gives 


Wa  =  (V  V  V  '6  -  "a5 


i  (V  V  +  fc8 

i  t8-  (v  a6) 


if  a8  -  fca 

if  aB  <  ta  1  V  fc6 
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(  a8+  fc8 

Therefore,  W  ^  I 

*  K 


with  probability  (p  -X  )/(X„+ p  -X  ) 
a  a  $  a  a 

with  prob.  Xg/ ( X^+p^- X&) 


(6.13) 


since  V  fc6  ^  a8  if,a6  >  V  and  V  (V  V  %  t6  if  V  aB  -  d  and 

t.  >  t  -  a.. 

8  -  ot  8 


Hence,  f„(x)  = 
a 


p  —  A  A 

*t~  1  ■  ~°~  °  f  (x)  +  ■— J — f.(x) 

WAa  u  WXa  % 


x  >  0 


where  u  =  a  +  t  . 

6  6 

E(V  -  rYr  E<*-+  *->  + 


§  E  (t  ) 
8'  “8'  ’  X „+  p  - X  6 


8  a  a  ,  w  '8  a  a 

VAa  , 

8'  '  X  +p  - \  E  ag 


=  E(tJ  + 

1 


8  Ho.  a 

P  —X  , 
a  a  < 


U6‘A6  YVAa  A6 


(6.14) 


Case  2:  If  we  have  more  than  two  TM's,  then  we  must  wait  until  the  arrival 

of  all  conflicting  writes  with  b  gger  timestamp  than  i  . 

a 

Therefore,  w  =  max.  (a+t  -  t  la  +  t  >t) 

«geG  gg  o'  g  g  —  a 

where  G  = { 8, Y  ,  ...}  -  set  of  subscripts  of  all  TM's  that  send  messages 

conflicting  with  i^  to  Dm^,  a^  is  the  interarrival  time  of  messages  at  TM^ 

and  tg  is  the  time  these  messages  take  to  get  to  DM^. 

Let  x  =  (a  +  t  -  t  la  +  t  >t),  then  W  =  max. (  x  ) 
g  gg  a'gg-a  a  g 

g  eg 


Let  v  =  a  +  t  .  Since  a  and  t  are  independent, 

g  g  g  g 


A1A2  "A1X  AIA2  -V 

v  =  'vY8  +  vy  e 


x  >  0 


where  X,  =  X  ,  and  X_  =  p  -  X  . 
1  g  2  g  g 


Now ,  x  ^ 

g 


,  a  +  t 

\  g  g 


with  probability  (p  -X  )/(X  +p  -X  ) 
a  a  g  a  o 

with  probability  X  /(X  +p  -X  ) 
g  g  a  a 


(See  Equ.  6.13)) 
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.  VXa  ,  X1A2  '  X1X  X1X2  -  X2X, 

Therefore,  fMx)  =  -  —  -  (— -  e  +  e  ) 

g  g  a  a  2  1  12 

A  —  A  _X 

g  ,  2 

+  X  +u  -X  x2  e 
g  a  a 

U  -  A  X 

F  (X)  =  P(  X  <  X)  =  — — - —  - - — 

x  g  -  A  +Ii  -X  A  -A., 

g  g  a  a  2  1 


x  >  0 


-A  x 

(  1  -  e  1  ) 


+  ( 


U  -A 
a  a 


A  +u  —A  A  —A 
g  a  a  1  2 


X  +y  -A 
g  a  a 


_X2X 

)  (1  -  e  )  x  >  0 


Now,  F  ( x )  =  P(W  <  x) 

W  a  — 

a 

=  P (  mt x .  x  <x' 

g  “ 


=  n  (x  <  x) 
geG  g 


Hence,  f„(x)  =  Fw<x),  and  EfW^)  =  /  (  1  -  F„(x))  dx. 


a 


a 


0 


W' 


Since  F  (x)  is  known  for  all  geG,  the  last  two  expressions  can  be 

g 


evaluated. 


6.3.3  Optimal  Read  Conditions 

In  deriving  pa  and  Wa  in  the  last  two  sections,  we  have  assumed  that 

the  timestamp  in  the  read  condition,  call  it  TSq,  is  the  same  as  the  time 

when  the  transaction  arrives  at  TM^  (TS^) .  Note  that  this  is  true  only 

for  transactions  running  under  protocol  P3.  For  messages  running  under 

protocols  PI  and  P2,  the  timestamp  in  the  read  condition  can  be  chosen 

arbitrarily  by  the  TM,  the  only  requirement  being  that  all  RR  messages  sent 

from  the  same  TM  on  behalf  of  a  transaction  have  the  same  read  condition. 

In  particular,  TM  can  choose  TR  -  TS.  +  x,  or  TS  =  TS.  -  x,  where  x 
a  o  i  o  l 

is  an  arbitrary  constant.  In  [BSR80] ,  the  builders  of  SDD-1  point  out  that 
the  choice  of  this  read  condition  timestamp  is  an  important  parameter  that 
must  be  finetuned  since  the  efficiency  of  the  system  depends  on  it.  Too 


arrives  at  TMa 


i  arrives  at  DMQ 


TIME 


(a)  Case  1:  TS^  TS,  *  x 

i„ arrives  at  TMa  ^arrives  at  DMa 


(b)  Case  2 :  TSQ  -  TSj  -  x 
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small  a  read  condition  timestamp  will  lead  to  a  lot  of  RR  messages  being 

read  rejected,  while  too  large  a  read  condition  timestamp  will  incur  an 

excessive  Wa,  the  waiting  time  until  a  conflicting  write  message  with  time- 

stamp  greater  than  TSq  arrives.  We  now  formulate  an  optimization  problem 

to  determine  the  best  value  of  TSq  under  various  situations.  This  approach 

will  probably  be  better  than  trying  to  arrive  at  a  good  TSq  by  trial  and 

error.  We  shall  consider  the  case  of  two  conflicting  TM's  only.  The  result 

can  be  extended  to  the  general  case  of  many  conflicting  TM's,  in  a  fashion 

similar  to  that  described  in  the  last  section. 

Case  1:  TS  =  TS.  +  x 
o  1 

Inspection  of  Fig.  6.11(a)  gives 

pa  =  p^a  arrives  at  after  ig  |  ig  has  timestamp  greater  than  TSq) 

=  P(ta  >  eg  +  tg  |  ag  >  x) 

Wa  =  (at  +  fc6  ~  fca  1  a6  -X'  a6  +  fcB  i  ta) 

ag  ^  x  assures  that  the  timestamp  of  ig  TSq. 

p  can  be  rewritten  as: 

rot 

pa  =  P(ag  <  T  |  ae  >  x)  where  T  =  tQ  -  tg 

ag  is  the  remaining  time  one  has  to  wait  until  the  arrival  of  ig  at  TMg, 
given  that  one  enters  the  system  at  a  random  time.  For  a  Poisson  process, 
this  remaining  time  due  to  random  incidence  is  still  exponential  with  the 


same  parameter 
Therefore , 

For  T  >  x, 


-Xg(a-x) 

f  (a|a>x)  =  *fie 

aela6-x  "  .  f 

[t  ,  -AB(a'X)  .  . 

p “  "J }  e 


a  >_  x 

-A6T  V 

e  e 


(6.15) 


For  T  <  x,  pn  = 


0 
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Since  T  is  a  random  variable,  we  have  to  integrate  Equ. (6.15)  over 


the  pdf  of  T.  Now 


fTlT>x(T„|T.  >r)  =  /<,  t.ix 


where 


Therefore, 


_  /  for  Tlx  (4  It) 

l  0 


for  T  lx 

f°*  T  <  X 


We  next  determine  P(T  >  x)  =  P(ta  >_  t  g  +  x) 


Consider  the  joint  pdf  space  of  ta  and  tg  (Fig.  6.12) 

Let  U  =  M  -  A  M  =P0  -  then 

1  a  a  2  M  v 

P(t  >  t„  +  x)  =  shaded  area 


oo  /  —  Ml  1 

=  /-A  |”x  ■*  F  J*..o 

=  A  r* 

+U  =  1£ 


Prom  Equ.  (6. 16),  ^ 

ft*  = 


u  _  -*£_•  PCT^-X>  i 


>/»+/*> 


therefore 


Pot  = 


(/f/i-A/i)  € _ _  ((,.n) 


>  f  ~  >*+/*«-**  /tfi-A/i  +/**  -/** 

Equ. (6.17)  says  that  Pa  decreases  with  increasing  values  of  x,  which 
is  in  tuitively  correct. 

We  next  calculate  Wa,  as  follows: 

Kt  =  (  dys  +  */3  “  **  |  *y3  +  V  *  *€*  ,  */J  ^  "*  ) 

_  |  ^-ii)  -f  if  «yi  >.  tl  ,  */S*X 

»-  ~  (  ~  A/s')  >f  °-/i  $  t*  $  Ayt+i/t  ,  f  ir**  *fi  y' 
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< 


&ji  +  -*yj 


if.  <Lp  Xr  ,  ^iV«n  Q-^y^x. 

•  f  <  +*  ,  £*v*’''  *7*  ^  * 


E(u/<)  =  E(  +  A^a  Pf*-/3  I  ttyi't'X') 

i-  E  I  «,«  >/^)’  P^V  <’t*‘f  V 

=  E(*p}  +  }x)  PCtp  Z  *otl  a/3  **) 

0/  ,  .  .  /  /“./a^+z*.)  <7 

Now,  P(  4/ 3^-^K/J^X)  =  <  / 

*■1  it  <  X 


Case  2:  TS^  =  TS .  -  x 

0  l 


11  if  ti  <  X 

"  /“i  €-/U' V^V*)  +  i-€.~^,X 

Therefore,  E  (V^  )  =  |/>2  f  (  +  X  )(  I  ~  ~^T/m7~) 

(fe-18) 


Inspection  of  Fig.  6.11(b)  gives 

P«  =  ^  +  x  la6  +  V 

Wa  =  <aB  +  fc6  '  fca  -  X  I  a6  +  fc6  -  fca  +  X> 

Note  that  any  i  that  arrives  at  the  system  at  a  time  after  TS  will 
P  o 

have  a  timestamp  bigger  than  the  read  condition  of  ia.  Moreover,  due  to 

the  memoryless  property  of  Poisson  processes, a0  is  still  exponential  with 

’  P 

mean  1/A  . 

p 

p^  can  be  rewritten  as:  pot  =  P(&fi  ^  Tf  x)  w /here  T  ■= 

_  (  l  -  £?  ~  V  ^  T+x^  for  T  +  X  0 

l  o  otherwise. 

Now  T,  being  the  difference  of  two  exponentials,  has  a  pdf  as  shown 
in  Fig.  6.13. 

r/-— X/lT'l  _  r 

Hence,  E  V.  €  )  —  u  )  6  €  *■  T» 


f  £e-*V'« 

M\  +/*■». 

+  J"  e'^1*  e'^'h 

/*</**■ 

r  | 

A  +  /“z 

l  /<*-*/*  ^  >/»+/»! 
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Figure  6.13  pdf  of  T,  the  difference  of  two  exponentials 
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a*d  ,  P(T  +  X?/0)  =  l-P(T+X<o') 

-  |  -  P(  T  < -x) 


Hewce 

where 


I 


f~*  /“i  /*x 


€^T^r 


*  I  -  /<i€'/'*V^i+A)  f6.2  0) 

=  [  i  -  e'^v  E(e‘>/|T)]  PfTf^r^o)  (6.21) 

E(,e"V,r)  aMd  pcT^.x'^o)  are  gir«r  by  (6.<9)  a*m((6.2«>) 


Wot  *  (+/3  +  -  -t*.  -  *  I  -t/i  +  *  **  +  *) 


A/1  -  (*4+X'>  +  */S 

'f 

V  »  **  +  * 

+/»  -<+*  +  *-  V  > 

‘f 

*/i  f  +*  +  *  $  *■/*  *  ^ 

*/*  +  +/3 

•f 

^/3  ^  +  X 

*/» 

•f 

^  *<  +  y  (  +  ^/» 

E(Kt)  =■  £<+/»')  +  BUf)  PC*?*****) 

=  E(+a)  +  Ea/3)  pC^>y>  *Are  =  V**- 

/«I  A*  e^'  +  v)  ^r  ^  <  0 

K/ow  t  -f  *  ( $■«)  ~ 

AV  rjV,t*/(/,.  +  A/»)  :for 


Hence,  "  4  e 

-  /•<  e“^xA/'n^ 

E(Kjc)  =  l//*2  +  /*>  €">)*V^»f/‘»+>yf )  (6.22) 

U/ber£  /<'  =/WflC-XoL  ,  =  /*  3 "  x/3  • 


For  both  Case  1  and  Case  2, 


E(delay  of  a  random  read  message)  *<  pQI  D+EfW^)]  +  (1  -  p^lECW^)  ( 6.23 ) 
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where  D  is  the  penalty  due  to  a  read  rejection.  The  expected  delay 


can  thus  be  minimized,  for  given  valuej  of  D,X  ,  X  , jj  ,  y  ,  by  varying  x. 

ot  B  u  0 


6.3.4  Optimal  Time  to  Send  Nullwrites 

A  read  message  with  read  condition  ^TS  ,  (j,k,...)>  arriving  at  DM., 

o  « 

must  wait  for  the  arrival  of  write  messages  from  conflicting  classes 
(j,k...)  with  timestamp  greater  than  TSq.  If  no  conflicting  write  message 
comes  into  the  system,  there  will  be  an  excessive  wait.  One  proposal  to 
remedy  this  problem  is  to  have  TM'sjend  nullwrite  messages  periodically. 

Consider  a  database  system  in  which  nullwrites  will  be  sent  whenever 
the  time  since  the  last  write  message  is  greater  than  S.  The  interarrival 
time  of  write  message  (both  regular  writes  and  nullwrites)  will  now  be 

-fa/O  =  A  ft  (6.24) 

ie.  an  exponential  constrained  to  be  less  than  S. 

Consider  an  instant  of  time  when  a  read  message  has  just  arrived  at 
TM  .  The  time  af',  we  have  to  wait  until  the  arrival  of  a  conflicting 
write  message  at  TM^  is  not  of  the  form  given  in  (6.24),  but  rather  has 
a  pdf  given  by  random  incidence  equations  (See  [DRAK67]) 


U<x)  = 


1  i L) 


E  ( Ay?  1 


N/oiv 


R  (  Ap  <  CL  )  “ 


f 

l 


T  -  e  ~X/i  * 


d  a- 


7* 


A 


1  <x  >  $ 


J_  -  e " XiS  (yS  +  1  ) 

V  (  i  -  ) 

1  -  e  /*5  +  1 


) 


a- p 


0  6  a  6  s  (6.25) 
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By  inspection  of  Fig.  6.9(b)  and  Fig.  6.10,  in  which  a-  must  be 

A 

replaced  by  a^ ,  we  get: 

P*  =  P(  +OL  >  -h  +1,) 

Kt  =(  tdL  I  a-/i  +  >  1«l  ) 

Now.  P(+*>4^+^)  =  >-t/a\*u>0p) 

=  P(+cL?*/s)P(+eL  >i ys')  (4.2£>) 

Let  A  =  1  -  e~:X/3$(  Afl  S  +  1) 

?(<V  =  C  Co  e~X/ii)/A  MAcl 


V  »  ' 


i 

wfee/e  g  —  ^  ~  6  ^  ^  ~-V*  €  /U| 

^  (  /*t  +  -*yO 

<  +<*.)  ~  i -  P ( Of! ±  +* )  **«(  p(  y-tji)  ~ /** /(/*,+/*) 
H«^c<.  ,  |ro«  E9*.  (fe  at')  j  ^  ^  B/f/U,  +/<0  r6.29) 

^  =  C  dp!  +  v  ~  **  l  V  +  4/*  ^  **  ) 

E  ^  Kc )  =  Edfi)  +  &  C  *p,  )  Pf  <Lp  ^  +„t ) 

eU^  =  ^(e~>/i<l-e'X/iSV^ 

=  »/v  -  V  s*  €~A/*5/ 2 A 

=  £  +  (^T  -  ^  rP*  S~)  0  -  b)  r**«) 


Substituting  these  values  of  p  and  E(W.)  into  Equ.(6.23)  for  the 
expected  delay  of  a  random  read  message,  we  can  minimize  the  expected  delay 
by  choosing  an  appropriate  value  for  S. 

jn  Appendix  III,  we  have  developed  expressions  for  p  and  E(W,)  when 

OL  oL 

tiiert;  are  more  than  two  conflicting  transaction  classes. 

We  can  also  derive  expressions  for  p  and  E(W.)  for  cases  where  TS 

<K  o 

T-  ‘  *  an<i  TS^  =  TS^  -  x,  as  we  did  in  section  6.3.3.  However,  for 
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exponential  interarrival  time  constrained  to  be  less  than  S,  these  express¬ 
ions  become  extremely  complex.  Therefore,  they  are  not  included  in  this 
thesis. 
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CHAPTER  7 

NUMERICAL  EXAMPLES 


We  shall  now  demonstrate  how  to  model  various  concurrency  control 
algorithms.  Four  examples  will  be  given:  (1)  Centralized  Locking  Algorithm 
with  Deadlock  Detection,  (2)  Distributed  Locking  Algorithm  with  Ordered 
Queues  for  Deadlock  Prevention,  (3)  Distributed  Locking  with  Prioritized 
Transactions  for  Deadlock  Prevention,  and  (4)  SDD-1. 

Fig.  7.1  shows  the  example  distributed  database  that  we  are  going 
to  use  this  chapter.  It  consists  of  a  communication  subnetwork  with 
five  nodes.  For  each  communication  channel,  all  of  which  are  directed, 
we  have  indicated  its  capacity,  the  existing  message  flow  (i.e.  not 
counting  anticipated  distributed  database  traffic)  and  the  existing 
transmission  delay.  There  are  three  files  X,  Y  and  Z  with  redundant 
copies.  The  location  of  the  redundant  copies  are  as  indicated  by 
the  artificial  file  node  and  the  artificial  links.  There  are  also  five 
classes  of  transactions  and  their  arrival  rates  are  as  indicated. 

Recall  that  our  DDB  Model  consists  of  five  steps.  The  input  data 
contained  in  Fig. 7.1  will  be  collected  in  Step  1.  Step  2  is  the  Trans¬ 
action  Processing  Model,  which  consists  of  the  Query  Processing  Step 
and  the  Write  Step.  Using  existing  delay  figures  on  the  communication 
channels  as  input  to  the  MSTl  Query  Processing  Algorithm,  we  found 
which  nodes  a  particular  transaction  will  access  to  read  a  file.  For 
example,  class  1  transactions,  with  readset  (X,Y) ,  will  read  file  X  at 
node  1  and  file  Y  at  node  4.  The  transactions  must  also  write  on  all 
copies  of  files  in  their  writesets.  The  nodes  accessed  by  different 
transactions  for  reads  and  writes  are  summarized  in  Fig. 7. 2.  Using 
this  information,  we  can  estimate  the  additional  traffic  generated  on 
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o 

□ 


COMPUTER  SITE 


f\„(X,Y),(Y,Z)] 
\  i 


writeset 


read set 


FILE  NODE 


100 


capacity  in  Kbs 
communication  channel 


arrival  rate  of 
transaction  class  1 


(  50,20)^  present  delay  in  msec 
'‘existing  flow  in  Kbits 


Figure  7.1  Example  Distributed  Database 
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each  communication  channel  by  a  particular  transaction  under  different 
concurrency  control  algorithms. 

In  Step  3  we  calculate  the  transmission  delay  at  each  communication 
channel,  given  the  additional  database  traffic.  The  additional  traffic 
and  hence  the  resultant  delays  on  the  channels  will  be  different  under 
different  concurrency  control  algorithms.  This  will  be  discussed  in 
more  detail  for  each  example.  We  distinguish  between  short  messages 
with  average  length  1/p  =  .1  Kbit  and  long  messages  with  mean  length 

of  1/u.,  =  1  Kbit.  Short  messages  include  lock  requests,  lock  releases, 
lock  grants,  read  requests,  commits,  and  acknowledgement  messages.  Long 
messages  include  file  transfers  and  pre-commits.  Each  communication 
channel  is  modelled  as  an  M/M/1  FCFS  queue  with  mean  service  time  =  1/p 


1  Y1  Y2 

=  - ; - ( —  +  — )  where  y  ,  y  are  the  arrival  rates  of  the  short  and 

Y1  Y2  U1  U2  1  2 

long  messages  respectively. 

In  Step  4,  we  estimate  the  probability  of  conflicts  and  the  delay 
due  to  conflict.  Conflict  models  for  the  four  concurrency  control  algorithms 
were  developed  in  Chapter  6. 

In  Step  'j  we  calculate  the  response  time  which  is  a  sum  of  the  query 
processing  delay,  the  write  delay  and  the  delay  due  to  conflicts. 


7 . 1  Centralized  Two-Phase  Locking 

Suppose  Computer  site  1  is  chosen  as  the  central  node.  (See  Fig. 7.1). 
All  transactions  have  to  request  locks  from  this  node. 

We  now  consider  the  message  flow  generated  by  the  DDB  management 
system  on  behalf  of  the  transactions.  Fig. 7. 3  summarizes  the  sequence 
of  events  corresponding  to  the  processing  of  each  transaction  under 
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'  '•  ntral  i  d  Two-Phase  Locking: 

ii)  transaction  sends  requests  to  central  node  to  lock  files  in  the  readr.et, 

(2)  wait  for  read  locks  at  central  node, 

(3)  central  node  sends  lock  grant  message  to  request  node, 

(4)  query  processing,  result  produced  at  request  node, 

(1)  transaction  sends  request  to  central  node  to  lock  files  in  the  writeset, 

(6)  wait  for  write  locks  at  central  node, 

(7)  central  node  sends  lock  grant  message  to  request  node, 

(8)  request  node  sends  pre-commit  messages  to  copies  of  files  written  on, 

(9)  conies  send  acknowledgement  messages  to  request  node, 

(10)  request  node  sends  commit  messages  to  copies  and  lock  release  message 
to  central  node. 

For  example,  consider  transactions  arriving  at  node  4. 


Message  description 

channels 

traversed 

message  type 

read  lock  request  from  nodes  4  to  1 

C43' 

C31 

short 

X^  read  lock  grant  from  1  to  4 

ci: ' 

C24 

short 

query  processing:  read  request  to  5 

C45 

short 

file  Z  transferred 

from  5  to  4 

C54 

long 

^  write  lock  request  from  4  to  1 

C43' 

C31 

short 

X^  write  lock  grant  from  1  to  4 

C12' 

C24 

short 

X  pre-commit  messages  to  all  copies  of 

4  X 

C43’ 

C31' 

C12 

long 

^  acknowledgement  messages 

C12' 

C24 

short 

X4  commit  messages  to  all  copies  of  X 

C43 ' 

C31  ’ 

C12 

short 

X.  lock  release  from  4  to  1 

4 

C43' 

C31 

short 

Similar  considerations  for  the  other  transactions  give  the  additional 
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message  flow  requirements  generated  by  the  database  management  system.  We 
can  then  calculate  the  expected  message  delay  on  each  of  the  communication 
channels.  For  example,  for  channel  C13,  we  have  X^  +  +  X^  +  X,.  =  .8 

additional  short  messages  per  second,  and  X^  =  .3  additional  long  messages 
per  second.  Assuming  that  the  existing  message  traffic  of  60  kbit  per 
second  on  C^>  i.e.  not  including  the  DDB  traffic,  are  all  long  messages, 
then  the  total  number  of  messages  on  channel  is  60  +  .8  +  .3  +  61.1 
message  per  second,  with  an  average  message  length  of  (60.3  xlK  +  .8  x.lK)/61.1 
=  .988  Kbits.  The  expected  queueing  delay*  =  61.1/ (.988  K(100)  -  61.1) 
x  (.988  K(100))  =  16.4  msec.  and  the  total  delay  (queueing  plus  service) 
for  short  and  long  messages  are  17.4  msec,  and  26.4  msec,  respectively. 

Similar  calculations  are  performed  for  the  other  channels  and  the 
result  is  summarized  in  Fig.  7.4. 

We  now  calculate  the  length  of  time  each  transaction  holds  a  lock,  i.e. 
from  the  time  the  central  node  sends  out  the  lock  grant  message  to  the 
time  it  receives  a  lock  release  message  from  the  request  node.  Let  us 
denote  the  transmission  delay  on  channel  (i,j)  for  long  and  short  messages 
by  r„  and  s„  respectively. 

Consider  Class  i  transactions,  the  length  of  time  they  hold  write 
locks,  denoted  WL^,  is  the  sum  of  (see  Fig. 7. 3): 

(1)  transmission  delay  of  write  lock  grant  from  central  node  to  node  i, 

(2)  delay  due  to  node  i  sending  pre-commits  to  copies  and  waiting  for 
acknowledgement  from  copies,  and 

(3)  transmission  delay  of  lock  release  from  node  i  to  central  node. 

The  length  of  time  they  hold  read  locks,  denoted  RL^,  is  the  sum 
of  (see  Fig.  7.3)  : 

*For  M/M/1  queues,  queueing  delay  =  X/g(p-X)  where  X  is  the  arrival  rate 
and  p  is  the  service  rate. 
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13 
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13 
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H 

50.8 

12.0  msec 

13.0  msec 

22.0  msec  1 

1 

1 

C'21 

3X2 

0 

.6 

10.6 

11.6 

1 

20.6 

C13 

Al+X2+A3 
+  X5 

X1 

.8 

60.  3 

16.4 

17.4 

26.4 

r  1 

31 

X1+2X3+ 

4A4 

B 

50.8 

26.9 

28.2 

39.4 

C24 

2  V- \ 

+2 

0 

X1  4X5 
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30.5 
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42.4 
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31.0 
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40.0 

36.9 

38.6 

53.6 
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X  +  X*+X 
13  5 

X1 

.6 

30.3 

34.5 

36.5 

54.5 

rr 

0 
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60.0 

53.  3 

54.6 

66.3 

X3+X5 

.9 

55.3 

42.9 
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40.1 
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121.0 

Figure  7.4  Additional  traffic  generated  by  DDB  Management  System 
under  Centralized  Locking  and  resultant  transmission 
delays  on  the  channels. 
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(1)  transmission  delay  of  read  lock  grant  from  central  node  to  node  i, 

(2)  query  processing  delay, 

(3)  transmission  delay  of  write  lock  request  from  node  i  to  central  node, 

(4)  queueing  delay  for  write  locks,  and 

(5)  WL. 

l 

Hence,  =  sn  +  max.  (r^  +  r24  +  s43  +  sn.  r^  +  r35  +  s53  +  s^)  +  su 

EfWI^)  =  E(si;l)  +  E[max.  (r12  +  r24  +  s43  +  s3L 

r!3  +  r35  +  S53  +  S31)]  +  E(sll) 

*  «ax.tE(r12  +  r24  +  s43  +  s^)  , 

E(r  +  r35  +  s53  +  s  )]  +  0  =  153.3  msec 
and,  RLl  =  sn  +  (s12  +  s24  +  r43  +  r^)  +  +  Q(Z)  +  WI^ 

E(RL  )  =  291.1  msec  +  Q(Z) 

where  Q(I)  =  queueing  delay  for  file  I,  I  =  X,Y,Z,  at  the  central  node.  It 
is  not  necessary  to  request  a  lock  on  file  Y  again  since  the  transaction 
already  owns  a  lock  on  file  Y. 

Similarly,  for  Class  2  transactions: 

E (WL2 )  =  0  since  Class  2  transactions  have  empty  writesets 

RL2  =  S12  +  (S21  +  S13  +  r31  +  ^ 12)  +  S21 
E(RL2)  =  115  msec. 

For  Class  3  transactions: 
e(wl3>  =  0 

RL3  =  S13  +  (S35  +  *53'  +  S31 
E(RL3>  =  138.3  msec. 

For  Class  4  transactions: 

WI,4  =  (S12  +  S24>  +  max‘(r43  +  r31  +  rl2  +  S24'  *43  +  r31  +  S12  +  S24) 

+  (S43  + 


I 
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E(WL^)  =  261.4  msec. 

RL4  =  (S12+S24)  +  <S45+r54)  +  (s43+S3l)  +  Q(X)  +  ^4 
E(RL4)  =  535.6  msec  +  Q(X) 

For  Class  5  transactions: 

^5  =  (S13+S35)  +  maX-(r53+S35'  V53+r31+S13+S35 ' 
r52+S21+S13+S35'  r54+S45)  +  (S53+S31> 

E(WL^)  =  285.9msec. 

The  length  of  time  a  lock  is  held  depends  on  which  transaction  owns  the 

lock  and  whether  it  is  a  write  lock  or  a  read  lock.  Therefore,  to  find  the 

average  time  a  lock  is  held,  one  must  weight  the  respective  lock-holding 

times  corresponding  to  different  transactions  by  their  arrival  rates. 

Therefore,  average  length  of  time  lock  on  file  X  is  held  = 

=  (X.E.  +  A_RL  +  A  WL  +  A.WlJ/U.,  +  A.  +  A.  +  Ac)  =  215.2  msec.  +  .375Q(z) 
XX  ZZ  44  55  X  Z  4  5 

Similarly,  average  length  of  time  lock  on  file  Y  is  held  =  b^ 

=  ( A.  RL.  +  A,5L  +  A.  ML.  +  ACWL  )  / (2 A,  +  A,  +  Ac)  =  227.0  msec.  +  .333Q(Z) 

11331155135 

and,  bz  =  (A2RL2  +  A3RL3  +  A^RL^  +  AjWL.j.  +  +  A2  +  A3  +  A4  +  A5> 

=  215.1  msec.  +  .111Q(X) 

b  ,  b  and  b  correspond  to  the  average  service  time  of  the  queues  to  lock 
X  Y  Z 

file  X,  Y  and  Z  respectively.  If  we  assume  these  service  times  to  be  expo- 

nential,  then  the  service  rate  of  the  three  queues  are  uv  =  1/b  ,  y  =  1/b 

x  x  i  y 

and  y  =  1/b  .  The  arrival  rates  of  the  lock  requests  are  : 

ct  z 

AX  =  X1  +  A2  +  A4  +  A5  =  -8' 

XY  =  2A1  +  A3  +  A5  = 


A  -  A  +  A  +  A-  +  A.  +  A_  =  .9. 

Z  1  2  3  4  5 
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Now,  Q  (X) 


y  (y  -X  ) 

X  X  X 


Xb2 

X  X 


1-X 


b 

X  X 


Q(Z) 


Xx(.2152  +  .375Q(Z))2 
1  -  X  ( . 2152  +  . 375Q  (Z) ) 

X 

.  xA 


Vpz -y 


l-x  b 

Z  Z 


X^ ( . 2151  +  . 111Q (X) ) 2 
1-XZ(.2151  +  .  III53  (X) ) 


(7.1) 


(7.2) 


Equ.  (7.1)  and  (7.2)  can  be  solved  simultaneously  to  obtain  values 
for  Q(X)  and  Q(Z).  An  interative  solution  technique  follows: 

Initialize  ;  Q(X)  =  Q(Z)  =  0 
Do  until  ;  Q(X)  is  close  to  Q(X)' 

Q(Z)  is  close  to  Q(Z)1 


Begin:  y(X) ' 


Q  (Z)  ' 


Xx<-2152  +  .375Q(2))2 
1  -  Xx(.2152  +  . 375Q (Z) ) 

X  ( . 2151  +  . 111Q (X) ) 2 

Li 

1  -  X  ( .2151  +  . 111Q (X) ) 

Li 


Q  (X)  =  Q  (X)  ' 
5(Z)  =  Q(Z) 1 


end; 


(7.3) 


In  this  case,  it  is  found,  after  three  iterations,  that 
Q(X)  =  .0547  sec. 

Q(Z)  =  .0549  sec. 

Therefore,  yx  =  4.24,  yy  =  4.08,  y^  =  4.52  and  the  utilization  of  the 
three  queues  are  px  =  X^ y  =  .189,  py  =  Xy/yy  =  .221,  and  pz  _  Xz/yz  =  .199 


respectively. 
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We  must  next  calculate  the  expected  additional  delay  due  to  deadlocks. 
Suppose  the  deadlock  detector  constructs  the  waits- for  graph  every  one 
second,  i.e.  on  the  average  it  takes  1/2  second  to  detect  a  deadlock. 

Fig.  7.5  shows  the  readsets  and  writesets  of  the  five  Classes  of 
transactions  in  our  example  and  the  potential  deadlocks.  To  simplify  the 
model,  we  are  ignoring  deadlocks  that  involve  more  than  two  transactions. 

In  addition,  as  is  mentioned  in  6.24,  Class  2  and  Class  3  transactions,  with 
empty  writesets,  will  not  enter  into  a  deadlock  with  other  transactions. 
Therefore,  in  our  example,  there  are  three  pairs  of  transactions  that  can 
create  deadlocks.  Consider  the  Class  1  and  Class  4  pair.  Suppose  T^ 

arrives  first.  A  deadlock  situation  is  shown  in  Fig.  7.6(a).  AB  corresponds 
to  the  time  between  the  arrival  of  request  to  lock  the  readset  and  the  request 
to  lock  the  writeset. 

Inspection  of  Fig.  7.3  gives 

AB  =  queueing  delay  for  read  locks  +  time  read  locks  held  -  time  write 
locks  held  -  queueing  delay  for  write  locks 
=  Q (X , Y)  +  RL1  -  WL1  -  Q(Z) 
where  Q(X,Y)  -  max.  queueing  delay  at  queues  X  and  Y 

and  Q(X,Y)  =  _ ^X _  +  PY  -  PXPY _  (see  Appendix  II) 

^Y(1-pY)  PX(1"pX>  +  VW 

=  .118  sec. 

Hence,  E(AB)  =  .118  sec.  +  .2911  sec.  -  .1533  sec.  =  .2558  sec. 

The  symmetric  situation  of  a  Class  4  transaction  arriving  first  and 
deadlocked  with  a  Class  1  transaction  which  arrives  later  is  shown  in 


Fig.  7.6 (b) . 
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Figure  7.5  Potential  Deadlocks  for  Example  DDB 
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T  j  arrives,  T  ^ 

joins  queues  to  joins  queues  to 

lock  read  set  :  ( X  ,  Y  )  lock  write  set :  ( Y  .  Z ) 
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T  arrives, 
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Fiqurp  7.6  Pindinq  Probability  of  b<  locks 
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A'B'  =  Q(Z)  +  RL  -  WL  -  Q(X) 

4  4 

E(A'B')  =  .0549  sec.  +  .5356  sec.  -  .2614  sec.  =  .3291  sec. 

Let  DD^_.  =  expected  delay  for  due  to  possible  deadlock  between 

T .  and  T . . 
i  3 

From  Equ.  (6.10)  and  (6.11)  respectively,  we  find 

DD14  =  2  .3+1  .1+1/. 2558  +  (‘3291+  2]  .3+.1  .3-^3291  * 028  S6C‘ 

and  DD  =  .027  sec. 

41 

Similarly,we  can  calculate  the  expected  delay  due  to  the  other  two 
potential  deadlocks. 


II 

m 

r—i 

a 

Q 

A.  •  J 

2  .3+. 2 

. 2+1/. 2558  =  -°146 

nn  — 

(.2558+  y) 

.3  .2 

DD  — 

51 

.3+. 2  .2+1/. 2558 

nn  — 

1  .1 

•  ^  mm 

DD45  " 

2  . 1+. 2 

.2+1/. 3291  •°103 

DD  = 

54 

(.3291+  j) 

.1  .2 
. 1+. 2  .2+1/. 3291 

.0221  sec. 


DD54  "  (-3291+  2]  . 1+. 2  .2+1/. 3291  '°171  S6C' 

(Note  that  there  is  no  possibility  of  deadlock  when  the  Class  5  tran¬ 
saction,  which  has  empty  readset,  arrives  first). 

We  can  now  calculate  the  response  time  for  the  different  transaction 
classes.  For  Class  i  transactions,  average  response  time  under  Centralized 
Locking,  RCL^  =  lock  request  transmission  time  +  queueing  time  for  locks 
+  time  locks  held  -  lock  release  transmission  time  +  expected  delay  due  to 
deadlock  (See  Fig.  7.3) 

Therefore,  RCL1  =  s  +  Q(X,Y)  +  RL^  -  +  DD14  +  DD15 

=  0  +  .118  sec.  +  {. 2911+. 0549) sec.  -  0  +  .028  sec. 

+  .0146  sec.  =  .507  sec. 

RCL,,  =  s^  +  Q(X,Z)  +  RL^  -  =  .0116  sec.  +  .1045  sec.  +  .115sec. 

-.0116  sec.  =  .220  sec. 


.0282  sec.  +  .118  sec.  +  .1383 


sec . 


RCL3  =  "31  +  2(Y'Z)  +  ^3  "  S  = 

-  .0282  sec.  =  .256  sec. 

KCL4  =  "43  +  °31  +  ^(Z)  +  ^4  "  ®43  "  ®31  +  D°41  +  D°45 

=  .031  sec.  +  .0282  sec.  +  .0549  sec.  +  (. 5356+ . 0547 ) 

-  .013  sec.  -  .0282  sec.  +  .027  sec.  +  .103  sec.  =  . 


RCL5  =  S53  +  S31  +  Q(X'Y'Z)  +  v;l5  ~  5^  -  s3l  +  DD5l  +  DD 


54 


-  .0442  sec.  +  .0282  sec.  +  .162  sec.  +  .2859  sec 
-  .0442  sec.  -  .0282  sec.  +  .0221  sec.  +  .0171  sec. 


sec . 

775  sec. 

=  . 506  sec . 
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7 . 2  Distributed  Two-Phase  Locking  with  Ordered  Queues  for  Deadlock  Prevention 

In  Distributed  Locking,  there  is  no  central  node,  and  a  transaction 
requests  lock  at  the  node  where  the  data  item  is  located.  Compared  to 
Centralized  Locking,  this  algorithm  is  superior  in  that  (1)  there  is  no 
central  node  which  is  the  bottleneck  in  Centralized  Locking,  and  (2)  less 
messages  will  be  generated  since  read  lock  request  and  read  request  messages 
can  be  combined  into  one  read  message.  (This  is  not  possible  in  Centralized 
Locking  since  the  central  node  might  be  different  from  the  node  where  one 
wants  to  read  a  file  copy.)  The  major  drawback  of  Distributed  Locking  is 
that  deadlock  detection  is  no  longer  feasible. 

Consider  the  example  shown  in  Fig.  7.1.  Suppose  Ordered  Queues  is  used 
to  prevent  deadlocks,  i.e.  all  transactions  are  required  to  request  locks 
in  a  specific  order,  say  lock  file  X  first,  then  file  Y  and  then  file  Z. 

This  means  that  when  a  transaction  wants  to  access  files  at  different  nodes, 
say  file  X  at  node  1  and  file  Y  at  node  4,  it  must  send  the  lock  requests 
in  serial  order,  i.e.  request  lock  X  first,  and,  after  receiving  the  lock 
grant  from  node  1,  request  lock  Y.  If  it  wants  to  access  files  located  at 
the  same  node,  say  files  Y  and  Z  at  node  5,  it  can  send  the  lock  requests 
simultaneously  as  one  message.  However,  at  node  5,  it  must  wait  for  Y  first, 
then  Z.  Fig.  7.2  shows  the  nodes  accessed  by  the  different  classes  of  tran¬ 
sactions  to  read  and  write  data.  Let  RN .  be  the  set  of  nodes  that  Class  i 

l 

transactions  access  to  read,  and  WN^  be  the  set  of  nodes  that  Class  i  tran¬ 
sactions  access  to  write.  We  now  consider  the  message  flow  generated  by 
the  DDB  management  system  under  Distributed  Locking  on  behalf  of  the  tran¬ 
sactions.  Fig.  7.7  summarizes  the  sequence  of  events  corresponding  to  the 


processing  of  Class  k  transactions: 
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Figure  7.7  Chronological  events  corresponding  to  Transaction  Processing  under 

Distributed  Two-phase  locking  with  Ordered  Queues  for  Deadlock  Prevention 
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(A)  For  each  node  ieRN^,  repeat  the  following  in  serial  order,  i.e.  send 
read  messages  to  file  X  first,  then  Y,  then  Z: 

(1)  send  read  requests  to  node  i,  the  read  lock  request  is  piggy-backed 
with  this  read  request. 

(2)  queue  for  read  locks  at  node  i 

(3)  node  i  sends  lock  grant  to  request  node  and  initiates  file  transfer 

(B)  For  each  node  jeWN^,  repeat  the  following  in  serial  order: 

(1)  send  pre-commits  to  node  j.  The  write  lock  request  is  piggy-backed 
with  the  pre-commit. 

(2)  queue  for  write  lock  at  node  j. 

(3)  node  j  sends  acknowledgement  messages  to  request  node. 

(C)  The  request  node  sends  commit  and  lock  -elease  messages  to  all  file  copies 
in  the  writeset,  and  lock  release  messages  co  copies  read  by  the  tran¬ 
saction. 

The  additional  traffic  generated  by  the  DDB  management  system  on  behalf 
of  the  transactions  can  be  estimated,  and  the  transmission  delay  for  long 
and  short  messages  can  be  calculated.  The  procedure  is  described  in  detail 
in  section  7.1  and  will  not  be  repeated  here.  Fig.  7.8  summarizes  the  results 
of  these  calculations. 

We  next  calculate  the  length  of  time  each  transaction  holds  a  lock. 


Let  RL.,  .  be  the  length  of  time  a  Class  i  transaction  holds  a  read  lock  on 
1W3 

file  W  at  node  j,  WL.  be  the  time  it  holds  a  write  lock,  and  Q.(W)  be  the 

1W3 

queueing  time  for  file  W  at  node  j,  then  for  Class  1  transactions:  (See 


Fig.  7.7  and  Fig.  7.8) 


WL  =  s  +  s  ,  WL  =  24.87  +  17.23  =  42.10  msec. 

i-  Lt  -5  J-L  -L  J 
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Figure  7.8  Additional  traffic  generated  by  DDB  management  system 

under  Distributed  Locking  with  Ordered  Queues  for  Deadlock 
Detection  and  resultant  transmission  delays  on  the  channels. 


-137- 


WL1Y5  =  WL1Z5  =  (SS2  +  S21}  +  "13  +  e3(Z)  *  WL 
WLly[.  =  WL1z5  =  171.77  msec.  +  Qj(Z) 


1Z3  ”  S13  +(S13  +  S35) 


WL1Y4  "  (S43+  S31)  +  (rl3+  r35]  +  Q5(Y,Z)  +  WL  lY5  ~(sl3+  S35)  +  <S43+  S31) 


WL1y4  =  309.39  msec.  +  Q3(Z)  +  Q5(Y,Z) 

RL1Y4  =  (r43+  r315  +  ^174 

RLiy4  =  387.94  msec.  +  Q  ( Z)  Q5(Y,Z) 

“*1X1  =  ril  +  (S12+  S24)  +  VY)  +  RL1Y4 
RL1X1  =  438.09  msec.  +  Q4 (Y)  +  Q3(Z)  +  Q  (Y,Z) 

Similarly,  for  Class  2  transactions, 

RL2Z3  =  (r31+  *12]'  ^2Z 3  =  57'3?  mSeC’ 

RL2X2  =  r22  +  Q3(Z)  +  ^233 
RL2X2  ~  57-37  msec.  +  Q  (Z) 

For  Class  3  transactions, 


RL3Z3 

=  r33 

rl 

-  r .  „ 

3Y4 

43 

^3Y4 

=  42. 

3Y4  ’  x3 

Class  4  transactions  are  slightly  different.  Since  the  writeset  contains 
file  X  and  the  readset  contains  file  Z,  to  maintain  the  serial  order  of 
locking  file  X  first,  then  Y  and  then  Z,  it  is  necessary  to  send  lock  X 
messages  before  sending  read  Z  requests.  (Recall  that  we  normally  send 
write  lock  requests  piggy-backed  with  pre-commit  messages).  Therefore,  for 
Class  A  transactions  only,  the  sequence  of  events  becomes: 
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(1)  send  write  lock  requests  to  all  copies  of  file  X 

(2)  send  read  request  to  node  5  to  read  Z 

(3)  send  pre-commits  (without  lock  requests)  to  all  copies  of  file  X 

(4)  copies  send  acknowledgement  to  node  4 

(5)  send  commits  and  lock  releases 

hence,  RL^  =  r54  +  (r^  +  +  r^)  +  (s^  +  s^)  +  (s43  +  831  +  8^) 

=  320.62  msec. 

4Z5 

“L4X1  ’  <»12  *  S24>  *  »45  *  %<Z>  +  «•«» 


WL  =  410.17  msec.  +  Qc(z) 
4X1  '5 


“L4X2  ’  =  24  +  S24  +  S45  +  «5,Z)  +  “to 


WL .  „  =  435.80  msec.  +  0C(Z) 
4X2  "5 


”L4X4  '  *44  *  S44  *  C5,Z>  +  “425 


WL4x4  =  320.62  msec.  +  Q5(Z) 


For  Class  5  transactions. 


WL5Z3  =  S35  +  S53 


WL,.  -  =  76.49  msec. 
5Z3 


WL  =  WL  =  sc_  +  r_,  +  Q,(Z)  +  S  +  s 
5Z5  5Y5  55  53  ’3  35  55 


WL  =  88.49  msec.  +  Q3(Z) 


“L5X4  ’  "L5Y4  *  *45  *  =55  *  V1'*1  *  "L5Z5  ‘  =55  *  =45 
5I5M  =  W^5Y4  =  167.20  msec.  .  Q3(z>  +  25<Y,Z) 


"L5X1  ‘  <S13  +  =35*  *  =54  *  <VX'y)  *  "Sx4  ‘  =45  *  <*13  +  S35’ 
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WL5X1  =  338.04  msec.  +  Q3(Z)  +  Q5<Y,Z)  +  Q4(X,Y) 


WL5X2  "  (S21  +  S13  +  S35>  +  r54  +  V*'Y)  +  «5X4  '  S45  +  (S21  +  S13  + 


WL5x2  =  361.18  msec.  +  Q3(Z)  +  Q5<Y,Z)  +  Q4(X,Y) 


The  length  of  time  a  lock  is  held  depends  on  which  transaction  owns 
the  lock.  Therefore,  to  find  the  average  length  of  time  a  lock  is  held  at 
each  node,  we  must  weight  the  respective  lock-holding  times  corresponding 
to  different  transactions  by  their  arrival  rates. 


let  bw_.  be  the  average  length  of  time  lock  on  file  W  is  held  at  node 


j.  This  corresponds  to  the  in-service  time  of  a  transaction  holding  a  lock 
on  file  W  at  node  j,  without  accounting  for  delay  due  to  blocking.  (  See 
section  6.2.1  ) 


bXl  -  “l^lXl  *  VL4X1  *  X5ML5X1>  /  U1  *  X4  *  V 


400.09  msec.  +  .5Q4(Y)  +  .833^(2)  +  .1667Q^(Z)  +  .333^  (X,Y) 


+  .833Q5(Y,Z) 


bX2  “  ( *2^2X2  +  VL4X2  +  7  (X2  +  X4  +  V 


=  254.58  msec.  +  .8Q3<Z)  +  .2Q5(Z)  +  .4Q5(Y,Z)  +  .4Q4<X,Y) 


bZ3  =  (X1WL1Z3  +  X2RL2Z3  +  +  X5WL5Z3)  7  (X1  +  X2  +  S  +  V 


=  49.26  msec. 


b„.  =  +  XcWLcv>1)  /  + 

X4  4  4X4  5  5X4  4  5 


218.40  msec.  +  .10C(Z)  +  .2Q,(Z)  +  .20C(Y,Z) 
’  b  J  b 


bY4  =  (X1RL1Y4  +  X1WL1Y4  +  X3RL3Y4  +  X5WIW  7  (2X1  +  X3  +  V 
=  506.78  msec.  +  0  (Z)  +  .8889Q  (Y,Z) 
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bY5  -  |X1WL1Y5  *  VSyS1  7  “l  *  V 
=  138.46  msec.  +  Q^(Z) 

bz5  '  *  X4S4Z5  +  V“W  7  <»1  +  *4  *  X5' 

=  168.82  msec.  +  .8333Q3<Z) 

In  our  example,  it  is  noted  that  nodes  1,  2  and  3  contains  only  one 

file  each.  Therefore,  the  in-service  times  at  these  nodes  are  given  by 

b  ,  b  and  b  .  However,  at  nodes  4  and  5,  there  are  two  files  each. 

X  J.  X  Z  ZJ 

When  a  transaction  accesses  these  files,  the  locks  must  be  requested  in 

serial  order.  Thus  the  in-service  time  of  locking  file  W  at  node  j  when 

the  serial  locking  order  is  observed,  denoted  by  a^ . ,  must  be  calculated 

as  described  in  6.2.2.  For  example,  at  node  5,  a ^  =  b^,  but  ay5  =  by5  + 

S^,  i.«.  the  in-service  time  at  the  queue  for  file  Y  plus  the  total 

service  time  at  the  queue  for  file  Z.  The  queueing  network  (described  in 

6.2.2)  corresponding  to  node  5  is  shown  in  Fig.  7.9(a). 

Let  X  .  and  y  ,  be  the  arrival  and  service  rates  of  lock  requests  for 
Mj  M] 

file  M  at  node  j,  w^  be  the  average  queueing  time,  be  the  average 

total  service  time  in  the  queue  to  lock  file  M  at  node  j.  Consider  node  3, 
^7.3  =  1/bZ3  =  2°-30 


Z3 _  =  _ J3 _ 

-  Z  "  UZ3(UZ3  -  *Z3)  20.30(20.30  -  .8) 


=  .00202  sec. 


Consider  node  5,  by^  =  138.46  msec.  +  Q3<Z) 


=  .14048  sec. 


a  =  b  =  168.82  msec.  +  .0333Q  (Z)  = 

Z  j  Z  j  j 


.1705  sec. 
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Figure  7 . 
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9  Queueing  Network  Models  for  Nodes  4  and  5  in  Example  DDB 
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=  . 1899  sec . 


25  PZ5  "  X?,5  1/-1705  -  .6 

Q^fZ)  =  w  ^  =  (.1899  -  .1705)  sec.  =  .0194  sec. 

a  _  =  b  +  S  _  =  .14048  sec.  +  .1899  sec.  =  .33038  sec. 

YD  YD  ZD 

Assuming  that  the  service  time  is  still  exponential, 
ty5  =  1/a  5  =  2.996,  and 


0  (Y)  = 


'  Y5 


UY5(PY5  '  V  2 . 996  (  2  996  -  .5)  -°6537  S6C’ 


Q  (Y,Z) ,  the  queueing  time  for  both  files  Y  and  Z, 

=  Q  ( Y)  +  Q  ( Z)  =  .08477  sec.  since  requests  for  files  'i 
'5  "5 


Consider  node  4,  bv/1  =  218.40  msec.  +  .  IQ  (Z)  +  .2Q  (Z)  +  .  2Q  I 


must  be  queued  serially 
DX4 

=  .2377  sec. 

a  -=  by4  =  506.78  msec.  +  03<Z)  +  .8889Q5<Y,Z) 

=  . 5842  sec. 

1 


S 


1 


Y4  “  yY4  '  XY4  ~  1X-S842  ‘  -9 


=  1.2317  sec. 


^4 ( Y)  =  WY4  =  Sy4  "  aY4  =  -6475  SeC- 
i  2 

a  „  =  —  b.+—  (b.+S.)  =  1.0588  sec. 

X4  3  X4  3  X4  Y4 

Assuming  that  the  service  time  is  still  exponential, 

u  ,  =  1/a  .  =  .94447,  and 
\4  ,s4 


_  aX4  . 3 

Q4U)  =  UX4<PX4  "  Xx7  =  •  94447  (  .  04447  -  .3) 

Q,  (X,Y)  0,(Xi  +  0  (Y)  =  1.140  sec. 

4  4 


=  .4929  sec. 


and  Z 

Y,Z) 


Consider  node  1, 


1  . 17‘<  sec. 
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Therefore,  Q  (X)  =  — — — -=  2.85  sec. 

1  UX1CUX1  AXl' 

Consider  node  2,  =  .7500  sec. 

XX2 

Therefore,  Q  (X)  =  - - - — r -  =  .4500  sec. 

PX2lyx2  ~  X2J 

We  can  now  calculate  the  response  time  of  the  different  transaction 
classes.  For  Class  i  transactions,  average  response  time  under  Distributed 
Locking  with  Ordered  Queues  for  Deadlock  Prevention,  RDLOQ.  =  read  request 
transmission  time  +  queueing  time  for  locks  at  first  node  accessed  +  time 
locks  held  -  lock  release  transmission  time  =  queueing  time  for  locks  at 
first  node  accessed  +  time  locks  held.  (  See  Fig.  7.7  ) 

Therefore,  RDLOQ  =  Q  (X)  +  RL 

1  1  1X1 

=  2.85  sec.  +  1.172  sec.  =  4.02  sec. 

rdloo2  =  Q2(x)  +  rl2x2 

=  .45  sec.  +  .05939  sec.  =  .509  sec. 
kdloq3  =  q4(y)  +  rl3y4 

=  .6475  sec.  +  .04446  sec.  =  .692  sec. 

RDLOQ4  =  E[max(Q1(X)  +  WL^,  Q2(X)  +  WL4X2 '  C4(X)  +  WL4X4)] 

-  Q  (X)  +  WL  =  2.85  sec.  +  .4296  sec.  =  3.28  sec. 

1  4X1 

RDLOQ5  =  E[max(Q1(X)  +  WL^,  Q^X)  +  WL5X2,  Q4(X,Y)  +  WL^)  ] 

=  Q^(X)  +  WL5xi  =  2-85  sec.  +  1.56  sec.  =  4.41  sec. 
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7.3  Distributed  Two-Phase  Lock  with  Prioritized  Transactions  for  Deadlock 
Prevention 

In  this  example.  Prioritized  Transactions  will  be  used  to  prevent 
deadlocks.  This  is  more  efficient  than  the  Ordered  Queues  scheme  in  that 
more  concurrency  is  possible.  Whereas  in  the  Ordered  Queues  scheme,  locks 
have  to  be  obtained  in  serial  order,  one  after  another,  in  the  Prioritized 
Transactions  scheme  they  can  be  obtained  simultaneously.  The  tradeoff, 
however,  is  that  for  Prioritized  Transactions,  it  is  sometimes  necessary  to 
restart  some  transactions. 

Fig.  7.10  summarizes  the  sequence  of  events  corresponding  to  the 
processing  of  Class  k  transactions: 

(A)  For  each  node  ieRN^  messages,  the  following  is  repeated: 

(1)  send  read  request  to  node  i,  lock  request  is  piggy-backed  with  this 
read  request. 

(2)  queue  for  read  lock  at  node  i 

(3)  node  sends  lock  grant  to  request  node  and  initiates  file  transfer. 

Note  that  the  nodes  will  be  accessed  simultaneously  and  the  delay  associated 

with  query  processing  is  max  R.  (See  Fig.  7.10),  where  R.  is  the  delay 

icRN^1  1 

associated  with  accessing  node  i. 

(B)  For  each  node  jeWN  ,  the  following  is  repeated: 

K 

(1)  send  pre-commit  to  node  j.  The  write  lock  request  is  piggy-backed 
with  it. 

(2)  queue  for  write  lock  at  node  j 

(3)  node  j  sends  acknowledgement  to  request  node. 

Again,  all  copies  will  be  accessed  simultaneously  and  the  delay  associated 

with  pre-commit  is  max  W.  (See  Fig. 7. 10),  where  W.  is  the  delay  associated 

i  ■  WNk 1  3 

with  accessing  node  i . 
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(C)  The  request  node  sends  commit  and  lock  release  messages. 

The  volume  of  messages  generated  in  the  communication  subnetwork  is 
similar  to  that  described  in  section  7.2.  It  is  therefore  assumed  that  the 
average  delays  on  the  communication  channels  are  the  same.  (See  Fig. 7. 8) 

We  next  calculate  the  length  of  time  each  transaction  holds  a  lock. 

Let  KL. ...  be  the  length  of  time  a  class  i  transaction  holds  a  read  lock 

lWj 

on  file  W  at  node  i,  WL.,,.  be  the  time  it  holds  a  write  lock,  Q.(w)  be  the 

lWg 

queueing  time  for  file  W  at  node  j,  MW.  =  max  W,  be  the  delay  associated 

1  j eWN .  1 

i 

with  pre-commit  for  Class  i  transactions,  and  MR.  =  max  R.  be  the  delay 

1  jsRN.  3 

l 

associated  with  query  processing  for  Class  i  transactions,  then  for  Class  1 
transactions:  (see  Fig. 7. 10) 

WL^^-  delay  due  to  [ire-commit  +  lock  release  transmission  time  -  pre-commit 

-  TUe'ie.  n  :  ’  ime  for  write  lock 
mv.  ,  •  i 

MW  -  -  ’me. 

MW,  •  ,  •  .  ,  -  :  ,  -  r ,,  -  (Y! 

i  >  j  . .  .5  '  ■ 


W 1 ,  , . 

MW , 

-  ,  , Y 1  -  27  msec . 

w:V.:  ' 

MW, 

■»  c  +-  -  r  -  r 

11  V  11  31 

-  V>r(Z) 

,v'"‘  ’  l ;  •  - 

MW, 

-  (?.)  -  27  msec . 

W : .  i  _  , 

i  ! 

MW , 

i 

1?  <:4  i:  r24 

-  0  (Y) 

4 

w;.,  _ _ , 
i  V  ♦ 

MW , 

-  , • ,  ‘  Y ''  -  2”  msec- 

K,.  1  •  ■ ,  iv  lu*  tf  query  processing  +  delay  due  to  pre-commit  +  lock  releas 

•rinsm.  irn  time  -  read  request  transmission  time  -  queueing  delay 
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Thorp  f  ore , 

RL 

1Y4 

RL,  .  = 

MR, 

+ 

MW 

IY4 

1 

1 

RL, 

MR 

+ 

MW 

1X1 

1 

1 

MRl  +  MW1  +  <s12  +  s24)  -  (s12  +  s24) 


For  Class  2  tiansactions: 


Ri 

-  MR_ 

+  (  R 

2  3 

2 

21 

RL 

-  MR 

-  Q,(2) 

223 

2 

3 

M,2X2 

-  mr2 

~  Q2  ( X) 

U21  *  “131  "  °3IZI 


For  Class  3  transactions: 


^323  =  MR3  -  ?3(Z) 


RL3Y4 

=  MR^ 

Rl3Y4 

=  MR^ 

For  Class  4 

WL4X4 

=  MW 

4 

WI,4X1 

=  MW, 
4 

WL4X1 

=  MW, 
4 

WL4X2 

=  MW, 
4 

^4X2 

=  MW, 
4 

W,425 

=  MR, 
4 

RI,42  5 

TT 

l£ 

II 

35  '54 


35  '  54 


43  31 


'43  31 


NL/v,  -  MW,:  +  <s„.  +  -  <r„c  -  -  Q,<X) 


45  '  52 


45  52 


?45  "  S45  ~  Q5<Z) 


4  -5 

For  Class  5  transactions: 
WL 


323 


53  ”  r 5 3  ~  °3(Z) 


Q4<y) 


MW,.  +  s 
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WL 

WL 

WL 

WL 

WL 

WL 

WL 

WL 

WL 

WL 

WL 


5Z3 

5ZS 

5Y5 

5X4 

5X4 

5Y4 

5Y4 

5X1 

5X1 

5X2 

5X2 


=  MWr  -  Q^(Z)  -  12  msec. 
5  3 


=  MWg  -  q5(z) 


=  MWC  -  Q  (Y) 
d  b 


=  *5  +  S54  -  r 54  -  VX) 


=  MWr  -  0„(X)  -  18  msec. 
5  '  4 


=  MW  +  s  .  -  r  -  O.(Y) 
5  54  54  4 


=  MW^  -  Q^(Y)  -  18  msec. 


=  MW5  +  (s53  +  s31)  +  (rS3  +  r31) 


=  MW5  -  0  (X)  -  23.24  msec 


=  ™5  +  S52  "  r 52  ~  Q2(X> 


=  MW,.  -  Q2(X)  ~  12  msec. 


Q  (X) 


We  next  find  the  length  of  time  a  lock  is  held  at  the  different  nodes. 


Let  b  .  be  the  average  lenqth  of  time  lock  on  file  W  is  held  at  node  j. 
Wl 

bxi  ■  <»^1X1  *  »4««1  *  7  (X1  *  X 4  *  V 


=  . 5MR  +  . 5MW,  t  . 1667MW  +  .333MW,  -  p, (X)  -  11.62  msec. 

11  4  5  1 


bX2  = 

(X  RL  + 

2  2X2 

X4"L4X2  *  X5“L5X2>  7  U2  *  X4  "  V 

= 

. 4MR  +  . 2 MW  +  . 4 MW  -  Q  (X)  -  10.2  msec. 

2  4  5  '2 

bZ3 

<''l"LlZ3  * 

X2ra’2Z3  *  X3RL3Z3  *  VW  7  ,X1 

+  X  +  X  + 
2  3 

= 

.375MW3  +  . 

25MR2  +  .125MR3  +  .25MW5  -  P3<Z)  - 

6. 375  msec. 

bX4 

(i45b4X4  * 

X5”E5X4»  7  <X4  *  V 
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=  .  333MW  +  .  667MW  -  Q„(X)  -  12  msec. 

4  5  4 

bY4  ’  U1E:1Y4  *  *  *3^4  *  ‘5"W  7  <2»1  *  *3  *  V 

=  . 333MR^  +  . 667MW^  +  . 111MR3  +  .222MW5  -  Q4<Y)  -  13  msec. 

bY5  '  "l^lYS  +  ‘s^SYs’  2  <\  *  V 

=  . 6MWX  +  .4MW5  -  Q  (Y)  -  16.2  msec. 

bZ5  =  (X1^1Z5  +  X4ib4Z5  +  X5^5Z3)  7  (\  +  X4  +  X5) 

=  . 5MW_  +  . 1667Mi  +  . 1667MW,  +  . 333MW„  -  Of Z)  -  13.5  msec. 

1  4  4  5  5 

Therefore,  we  have  seven  equations  in  seven  unknowns,  namely  Q^(X) , 

0_(X) ,  Q,(Z),  0. (X),  9.(Y),  0C(Y)  and  0_(Z).  (  Note  that  b„.  is  related 

2  '3  4  ‘4  '5  5  Wi 

to  (W)  by  the  equation 
—  X  XWibWi 

Q.  (W)  =  — - —  =  - - 7 - - —  where  X  is  the  lock  request  rate  for  file 

l  u(u  -  X)  1  -  X  . b  Wi  ^ 

Wl  Wl 

W  at  node  i.  )  Of  course,  we  have  to  determine  MIb  ,  MW^ ,  i  =  1,...,5  first 

For  example,  MR  =  max  (Q, (X) ,  s.„  +  s  +  0.(Y)  +  r..  +  r,,) 

1  1  12  24  4  43  31 

MR  -  max  (Q1<X),  Q4(Y)  +  128.7  msec.) 

Expressions  for  MR^ ,  MR^,  etc-  are  difficult  to  obtain  in  closed 
form.  Therefore,  we  make  the  additional  assumption  that  the  delay  corres¬ 
ponding  to  accessing  each  node,  i.e.  the  transmission  time  plus  queueing 
time  for  locks,  is  exponentially  distributed.  Since  the  expected  value  of 
the  maximum  of  several  exponentials  have  been  derived  in  Appendix  IV,  we 
can  find  closed  form  expressions  for  MR^,  MR^ ,  etc.  For  example, 

MRj  =  Q  (X)  +  Q4  ( Y)  +  l/U/O  (X)  +  1/04(Y)). 


After  substituting  these  expressions  of  MR^ ,  MW^  into  Equations  (7.4) 


we  cnn  solve  them  simultaneously,  using  an 


iterative  procedure. 


-150- 


The  procedure  converges  after  six  iterations  and  outputs  the 
following : 


bXl 

. 421  sec. 

Q1<x) 

=  .134  sec. 

bX2 

.173  sec. 

Q2(x) 

=  . 164  sec . 

bZ3 

.226  sec. 

Q3<  z) 

=  .0579  sec. 

bX4 

. 194  sec . 

q4(x> 

=  .0119  sec. 

bY4 

.355  sec. 

Q4(Y) 

—  ■ 166  sec . 

bY5 

.256  sec. 

P5(Y) 

=  .0375  sec. 

bZ5  = 

.278  sec . 

Q5(Z) 

=  .0557  sec. 

MR^  = 

511  sec, 

MR  = 

2 

171  sec. 

MR3  = 

.423  sec. 

II 

if 

196  sec. 

MW '  =  . 

407  sec . 

MW  = 

4 

334  sec. 

MWr  =  . 

108  sec . 

^1X1  = 

.  784  .sec. 

^1X4 

.752  sec. 

^1Y4  = 

.214  sec. 

WI,1YS 

.343  see. 

WL1Z3  = 

. 340  sec. 

^1Z5  " 

.324  sec.  ' 

1 

RL 

2X2 

7  sec. 

RL2Z3 

. 113  sec . 

RL3Y4 

1 

. 365  sec.  ! 

RL 

37,1 

.365  . 

RI,425 

.474  sec. 

^4X1  * 

j 

. 198  sec . 

WL4X2 

•  I-,  sec. 

^4X4  " 

. 322  sec. 

^5X1  = 

. 172  sec. 

*Sx2  " 

.132  sec . 

^5X4  * 

.278  sec. 

^5Y4  = 

.124  sec. 

*Sy5  " 

.271  sec . 

WL  = 

57,3 

.249  sec. 

^5Z5  * 

.25 2  sec. 

- 

■ 

L^.  ■  .Ate.  %  n  O',  -w..  - M 
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W, >  in  xi  calculate  the  probability  a  transaction  will  be  restarted 
under  the  wound-wait  deadlock  prevention  scheme .  It  can  be  shown  that  for 
tliis  particular  example,  wound-wait  induces  fewer  restarts  than  wait-die. 
Since  two  transactions  conflict  if  at  least  one  of  them  is  a  write  trans¬ 
action,  there  are  three  distinct  cases  to  consider:  (1)  a  Class  i  trans¬ 
action  T.  owns  a  read  lock  on  a  data  item  X  at  node  r  and  a  Class  i  trans- 
1 

action  T\  tries  to  qet  a  write  lock  on  X  at  the  same  node,  (2)  T\  owns  a 

wi i to  lock  on  Y  at  node  w  and  a  Class  k  transaction  T,  tries  to  qet  a  read 

k 

lock  on  Y  at  node  w,  and  (3)  T.  owns  a  write  lock  on  Y  at  node  w  and  a 

i 

Class  m  transaction  T  tries  to  qet  a  write  lock  on  Y  at  node  w. 

m 

Recall  that  under  wound-wait,  every  transaction  is  qiven  a  timestamp 
(  its  priority  )  when  it  enters  the  system,  and  a  transaction  will  be 
restarted  if  a  conflicting  transaction  with  higher  priority  (older  time¬ 
stamp  )  is  forced  to  wait  for  it  to  release  a  lock. 

Inspection  of  Fig.  7.10  lets  us  construct  the  three  scenarios  cor¬ 
responding  to  transaction  restarts.  (See  Fig.  7.11  (a),  (b) ,  (c).)  Let 

a.  be  the  interarrival  time  of  T..  In  Case  1  (Fiq.  7.11(a)),  T.  will  be 
i  i  •  l 

restarted  if  a  conflicting  T.  with  older  timestamp  arrives  at  node  r 
-fore 

after  T.  does.  There.PR. .  =  P(T.  restarted  by  T.  at  node  r) 

l  Ai-jr  i  j 

=  P(  AB  <  AD  <  AC  ) 

=P(  AB  <  AD  ) P (  AD  <  AC  |  AB  <  AD  ) 

=P(  AB  <  AD  ) P (  AD-AB  <AC-AB  |  AB  <  AD  ) 

=P(  AB  <  AD  ) P (  AD-AB  <BC  |  AD  -  AB  >  0  ) 

We  now  make  the  additional  assumptions  that  AD  and  BC  are  exponentially 


distributed,  then 


.52- 


T.  arrives 
most  recent  1 

T  arrives  at  node  i 
1 

at  node  i 


T\  enters  queue  X 
at  node  r 


lock  X 
released 


k 


sends  read  req. 


1—T. 


t  .  - 
ir 


queues  for  lock  X 


* 


Ortx) 


holds  lock  X 


t-  RL.„ 
iXr 


MR  . 

1 1 


♦  D 


ir 


query  processinq  sends  pre-commi r 
for  T 

1 


T.  arrives 


at  node  r 


(a)  Case  1  :  T.  restarted  because  of  read-write  conflict 


sends  read  request 


at  node  w 


lb)  Case  2  :  T.  restarted  because  of  write-read  conflict 


at  node  w 


(,.)  case  <  :  T .  rest  at  ted  because  ot  writo-writo  conflict 


Figure  7.11  Finding  Probability  and  Delay  of  Transaction  Restarts 
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i'H.  .  P(  MR.  +  r.  '  a.  )  P(  MR.  +  r.  >  s  )  • 

:  if  -]  1  r  1  1  i r  lr 

P (  Q  (X)  +  RL .  >  MR  .  +  r  .  )  (7.5) 

r  lXr  j  -jr 

=  F.  (  delay  for  'I\  due  to  rejection  by  T_.  at  node  r  ) 

=  time  wasted  in  processinq  the  aborted  transaction  +  transmission 

delay  from  node  r  to  node  i  to  report  the  abortion 

=  E(s.  )  +  E(  BD  J  AB  <  AD  <  AC  )  +  E  (s  .) 
ir  ri 

=  E(s.  )  +  E(s  .)  +  E(  BD  I  BD  <  BC  ) 
ir  n 

=  E(s,  )  +  E(s  .)  +  E(  min. (  BD,  BC  )  ) 
ir  n 

See  Appendix  I  for  a  derivation  of  E(  BD  |  BD  <  BC  )  =  E(  min. (  BD,  BC  )  ) . 

Hence,  W.  .  =  E(s.  )  +  E(s  .)  +  E(  min.(r.  +  MR .  ,  ft  (X)  +  RL .  )  )  (7.6) 

lir  ir  n  jr  j  r  lxr 

Tn  Case  2  (Fig.  7.11(b)),  PR  =  P(  T.  rejected  by  T,  at  node  w  ) 

lkw  i  k 

=  P(  AB  <  AD  <  AC  )  =  P(s,  >  a,  )  P(s,  >  MR.)  P(s,  >  r.  )• 

kw  k  kw  l  kw  iw 

P(0  (Y)  +  WL.  >  s,  )  (7.7) 

w  lyw  kw 

and  Wi]<w  =  E  (  delay  for  T\  due  to  rejection  by  at  node  w  ) 

=  E(s  .)  +  F,  ( MR . )  +  E  ( s .  )  +  E  (  min.  (s,  ,  Q  (Y)  +  WL.  ))  (7.8) 

wi  l  iw  kw  w  lyw 


In  Case  3  (Fig.  7.11(c)),  PR.  =  P(  T.  rejected  by  T  at  node  w  ) 

lmw  l  m 

=  P(  AB  <  AD  <  AC  )  =  P ( MR  +  s  >  a  )  P(MR  +  s  >  MR. ) • 

m  mw  m  m  mw  l 

P (MR  +  s  >  t.  )  P(0  (Y)  +  WL.  >  MR  +  t  )  (7.9) 

m  mw  iw  w  lyw  m  mw 


and  W.  =  E (  delay  for  T.  due  to  rejection  by  T  at  node  w  ) 
lmw  l  m 


E ( s  .)  +  E ( MR. )  +  E ( r .  )  +  E(  min. (MR  +  s  , 
wi  i  lw  m  mw 


O  (Y) 
w 


+  WL. 

ryw 


) ) 


(7.10) 


We  ran  now  calculate  the  probability  of  restart  for  each  transaction 


class.  Consider  Class  1  transactions  : 

P^  =  P(  is  restarted  ) 

=  1  -  P(  not  restarted  ) 

=  1  -  .II,  If  P(  T,  not  restarted  by  T.  at  node  a  ) 
i?U  a  1  l 

since  (1)  the  probability  a  transaction  is  restarted  at  the  same  node  by 
different  transactions  are  independent,  and  (2)  the  probability  a  trans¬ 
action  is  restarted  at  different  nodes  bv  the  same  transaction  (  or  by 
different  transactions  )  are  independent. 

Pig.  7.2  shows  at  which  nodes  the  transaction  classes  conflict.  For 
example,  T  and  conflict  at  node  3.  Equ.(7.7)  gives 

PR123  =  P(S23  >  V  P(S23  >  MV  P(S23  >  ri3>  P(VZ)  +  WL1Z3  >  S23) 


1.957 


13 


23 


A2  +  "23 


1.957  +  y 


23 


V13  +  "23 


"23  +  2-513 


=  .00042 


where  y. .  =  1/E(s. .) ,  v. .  =  1/E(r, .) ,  and  it  is  assumed  that  MR,  and 
13  13  13  13  1 

Q  (Z)  +  WL  are  exponentially  distributed.  Equ.  (7.8)  gives 

J  lu  J 

W123  =  E(S31)  +  ^1  +  E<S13)  +  E<S23>  =  ' 609  sec’ 


Similarly,  we  can  find  the  probability  a  Class  i  transaction  will 
be  rejected  by  a  Class  j  transaction  at  node  k  (i,j,k  =  1,2, 3, 4, 5) 
and  the  expected  delay  due  to  the  rejection. 

P.  =  P(  T.  is  restarted  ) 

l  i 

=  1  -  .11.  P(  T.  not  restarted  by  T.  at  node  k  ) 

l/ii  3 

k=l , . . , 5 

W.  =  E (  delay  for  T.  due  to  rejection  )  -  .L.  PR.  .  •  W, 
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The  following  values  of  P.  and  W.  were  calculated: 

1  1 


P1  “ 

.0483, 

W1  ' 

.0102 

sec. 

P2  = 

.0540, 

W2  “ 

.0092 

sec . 

P3  ' 

. 1325, 

w3  - 

.0421 

sec. 

P4  = 

.0081, 

II 

s 

.0307 

sec. 

P5  = 

.2171, 

w5  = 

.0607 

sec . 

The  average  response  time  of  the  5  Classes  of  transactions  can  now 
be  calculated.  For  Class  i  transactions,  average  response  time  under 
Distributed  Locking  with  Prioritized  Transactions  for  Deadlock  Prevention, 


RDLPT. 

l 

=  MR.  + 

1 

MW.  + 
l 

delay 

due  to 

rejection 

=  MR.  + 
l 

MW  + 
1 

P.W. 
i  i 

(  see 

Fig.  7.10 

Hence , 

RDLPT l  = 

.918 

sec. 

rdlpt2  = 

.  171 

sec . 

RDLPT 3  = 

.429 

sec. 

RDLPT „  = 
4 

.  530 

sec. 

rdlpt5  = 

.  321 

sec. 
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7.4  SDD-1 


In  this  example,  we  shall  calculate  the  response  time  for  the  five  classes 
of  transactions  under  the  SDD-1  Concurrency  Control  Algorithm.  We  are  using 
the  same  notations  as  in  section  7.1  and  making  the  same  assumpitons. 

The  volume  of  messages  generated  by  SDD-1  is  similar  to  that  described 
in  section  7.2.  It  is  therefore  assumed  that  the  average  delays  on  the 
communication  channels  are  the  same  (as  shown  in  Fig.  7.8). 

Let  sj»  r^j  denote  respectively  the  average  delay  of  a  short  and  a 
long  message  between  nodes  i  and  j.  Since  transmission  delays  are  assumed 
to  be  exponentially  distributed,  the  parameters  of  the  exponential  distribu¬ 
tion  corresponding  to  short  and  long  messages,  are  given  by  u^  =  1/s ^ 


and  v . .  =  1/r . . . 

Let  us  first  construct  the  conflict  graph  for  our  five  classes  of 
transactions  The  conflict  graph  (See  Fig.  7.12)  consists  of  nodes  representing 
the  readsets  and  writesets  of  the  transaction  classes.  The  links  on  the 
graph  indicate  potential  conflict  between  the  transactions.  Therefore,  two 
nodes  are  connected  if  at  least  one  of  them  is  a  writeset  and  they  have  at 
least  one  file  in  common. 

In  SDD-1  [BSR80] ,  the  conflict  graph  is  analyzed  during  database 
design  and  synr  ization  protocols  are  devised  to  maintain  serializability . 

It  is  found  that  three  protocols  PI,  P2,  and  P3  are  necessary.  A  fourth 
protocol  P4 ,  is  sometimes  invoked  to  improve  on  the  efficiency  of  the  other 
three  protocols.  The  SDD-1  protocol  selection  rules  (Fig. 7. 13)  state  which 
prototcols  should  be  invoked  by  which  transactions. 


\ 


Transaction  1 
class 


2 


3 


4 


Readset  XY  XZ  YZ  Z 


Figure  7.12  Conflict  Graph  for  Transaction  Classes 
in  SDD-1  Example 


-158- 


(a)  Transactions  in  class  i  must 
obey  PI  with  respect  to 
transactions  in  class  j 


(b)  Transactions  in  class  i  must 
obey  P2  with  respect  to 
transactions  in  classes  j  and  k 


(c)  Transactions  in  class  i  must  obey 
P3  with  respect  to  transactions  in 
class  j 


Figure  7.13  SDD-1  Protocol  Selection  Rules  (Adapted  from  [BSR80] ) 
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«# 

SDb-1  Protocol  Selection  Rules  (adapted  from  [ BSR80 ] ) 

i  l  '  For  all  classes  i  and  j  such  that  (r1,  w^)  is  in  the  conflict  graph, 
transactions  in  i  must  obey  protocol  PI  with  respect  to  transactions 
in  j  (see  Fig. 7. 13  (a). 

(3)  For  each  cycle  in  the  conflict  qraph  the  following  hold: 

(a)  for  all  distinct  classes  i,  j,  k,  if  edges  (r1 ,  w^)  and  (r^  ,  W^) 
lie  on  the  cycle,  then  transactions  in  i  must  obey  P2  with  respect 
to  transactions  in  j  and  k  (see  Fig. 7. 13(b))  and 

(b)  for  all  distinct  classes  i  and  j  such  that  (r1,  w1)  and  (r1,  w"1 ) 
lie  on  the  cycle,  then  transactions  in  i  must  obey  P3  with  respect 
to  transactions  in  j  (see  Fig . 7 . 1 3  (c) ) . 

Briefly,  these  protocols  serve  the  following  purposes: 

PI  Prevents  read  messages  from  one  transaction  that  conflict  with  write 
messages  from  another  transaction  from  being  processed  in  different 
relative  orders  at  different  DMs . 

P2  Prevents  a  read  message  from  seeing  write  messages  from  two  other 
transactions  in  reverse  timestamp  order. 

P3  Prevents  two  transactions  that  read  each  other's  output  from  both 

reading  before  either  writes,  i.e.,  prevents  a  classical  race  condition. 

According  to  the  SDD-1  protocol  selection  rules,  and  the  conflict 
graph  (Fig. 7. 12),  it  is  necessary  that: 

Class  1  transaction  runs  P3  against  Class  5  transactions 

Class  2  transaction  runs  P2  against  Classes  1  and  5 

Class  3  transaction  runs  P2  against  Classes  1  and  5 

Class  4  transaction  runs  P2  against  Classes  1  and  5 

Class  5  transaction  runs  P3  against  Class  5 
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The  last  two  requirements  are  equivalent  to: 

■ '  1  ass  4  runs  P2  against  Class  1,  and  Class  4  runs  P3  against  class  5. 

An  inspection  of  Fig.  7.2  shows  at  which  nodes  these  protocols  are  to 
be  executed  for  each  transaction. 

Consider  a  Class  1  transaction  T^.  It  is  running  P3  against  Class  5 

transactions  TV  •  Thus  when  T.  tries  to  read  file  Y  at  node  4,  where  T,_ 

5  1  5 

is  writing  file  Y,  the  protocol  is  invoked.  The  timestamp  of  T  must  be 

smaller  than  that  of  file  Y,  in  order  for  T^  not  to  be  rejected. 

Let  P. .  =  P(T.  rejected  by  T.  at  node  k) , 

13  l  3 

W. .  =  time  T.  has  to  wait  at  node  k  until  its  read  condition 
13  i 

against  TL  is  satisfied 

D._.  =  delay  of  T^  due  to  read  rejection  by  T_.  at  node  k. 

P.  =  P (T.  is  rejected) 

li 

w.  =  delay  of  T^  corresponding  to  query  processing 

D.  =  delay  of  T.  due  to  read  rejection 
i  i 

4 

Therefore,  P^  =  P  since  T^  only  needs  run  the  synchronization  protocol 

4 

against  T^  at  node  4.  Equ.(6.12)  gives  P^  =  P^5  =  p(s14  >  a5  +  r54^ 


X  +  n  V  4-  u 
5  u14  54  14 


.00325,  where  a,_  =  interarrival  time  of  Class  5 


transactions  at  node  5.  (Recall  that  we  assume  read  messages  are  short  and 

4 

write  messages  are  long).  Each  rejection  incurs  additional  delay  =  =  D 

round-trip  delay  from  node  1  to  node  4  =  s  ^  =  . 105sec.  If  not  rejected, 


Tj  must  wait  at  node  4  until  its  read  condition  is  satisfied.  Equ.(6.14) 

4  1  J14  1 

gives  this  expected  wait  as  E(W  )  =  -  + - — : —  .  r —  =  5.05  sec.  This 

15  v  u  +A  A 
54  14  5  5 
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wait  can  be  reduced  if  node  5  send  periodic  nullwrites. 

Suppose  node  5  send  nullwrites  whenever  the  time  since  the  last  write 

4 

message  is  greater  than  1  second,  then  Equ.(6.28)  and  (6.29)  give  P^  = 

4 

-  .0334  and  ECW^)  =  .3974  sec. 

Let  RSDD^  be  the  response  time  of  transaction  under  SDD1,  then 
RSDD^  =  E (delay  due  to  read  rejection)  +  query  processing  delay 
+  write  delay 

=  (I14  +  I41)/(1  “  V  +  =14  +  E(V  +  ?41  +  maX' (?12  +  F24  +  =43  +  =  31' 
+  +  ^  =  -^24  sec. 

where  1/(1  -  P  )  is  the  expected  number  of  rejections. 

We  next  consider  Class  2  transactions  runs  P2  against  Classes 

1  and  5.  An  inspection  of  Fig.  7.2  shows  that  runs  P2  against  at  node 
3,  P2  against  at  node  2  and  P2  against  at  node  3.  Using  Equ.  (6.28)  and 
(6.29),  we  find  =  .03128  and  E(W2^)  =  .3317  sec. 

Similarly,  P  ^  =  .02099 

EW  *)  =  .361  sec, 

E(W  *>  =  .395  sec. 

Therefore,  from  Equ.  (III. 1),  P0  =  1  -  P(T2  not  rejected) 

1  -(1-P21)(1  -  P255  (1  "  P25) 

=  .052 

=  E (delay  due  to  rejection) 

P21  (a21+S12>  +  P25(S23  +  S32) 

3  1 

!’  IF 

21  25 


=  .041  srr . 
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If  T  is  not  rejected,  then  the  delay  corresponding  to  query  processing, 


W  is  given  by  Equ.(III.2)  as  follows: 


=  max. 


3  3  2 

s„„  +  max (  W. , , W„ _ )  +  r  „ ,  s„  +VJ  +r„  „ 
23  21'  25  32  22  25  22 


3  3  —  —  2  —  1 

Therefore  E(W^)  ~  max.  |  s_  +  max  (E  (Wnl  )  ,E  (W^c  )  +  r,.,,  s,,,,  +  E(W^C)  +  r 22 


23 

=  .447  sec. 


21' 


25 


32'  22 


25' 


Hence,  RSDD^  =  D2/(l  -  P2>  +  E(W2>  =  .490  sec. 


Consider  Class  3  transactions  T  .  T3  runs  P2  against  and  T5  at  node 


3  and  P2  against  and  T  at  node  4. 


P^  =  P (T^  rejected) 


-  1  -  (1  -  P3J)  (l  -  P3^)  (i  -  pJ>  (l  -  P345> 


1  -  (1  -  0) (1  -  .0910) (1  -  0) (1  -  .0636)  =  .149 


=  E (delay  due  to  rejection) 


4  —  —  4  —  — 

P-i (s,  +s  )  +  P,c(s,  +s  ’  —  — 

31  34  43  35  34 . 43  =  s,^  +  s.,  =  .105  sec. 

34  43 


4  „  4 

i  +  p 

31  35 


If  T3  is  not  rejected,  then  the  delay  corresponding  to  query  processing 


W3  is  given  by 


W3  =  max. 


s33  +  max(W313,W353)  ♦  r33,  s34  ♦  (W^W^)  +  r^ 


E  (W3>  -  max 


s33  +  max  (E  (W  3  3  )  ,  K(W33)  +  r  3  3  ,  sJ4  +  max  (E  (W^ )  ,  E  (W^ )  )  +  r^ 


.499  sec. 


Hence,  RSDD3  =  D3/ ( 1  -  +  E{W3>  =  .622  sec. 


Consider  Class  4  transactions  T4 .  T4  runs  P2  against  at  nodes  1  and 


5,  P3  against  at  node  5. 


■ 
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]'  =  P(Tj  rejected) 

=  1  -  (1  -  P4J)a  -  P4^)a  -  P45> 

=  1  -  (1  -  .1131)  (1  -  .0267)  (1  -  .0804)  =  .206 


=  E (delay  due  to  rejection) 

=  P41(j41+S14)  +  P41  fs45+S54)  +  ?45  (s45  +  S54) 

P  1  +  P  5  +  P  5 

41  41  45 


=  .116  sec. 

If  T  is  not  rejected,  then  the  expected  delay  corresponding  to  query 
processing 

1  5  S 

E (W  )  ~  max.  (s._  +  E(W  )  +  r  ,  s  +  max (E (W  ) , E (W  ) )  +  r  }=  .542  sec. 
4  41  41  1445  41  45  54 

Hence,  RSDD^  =  D4/(l  -  P4)  +  F  (W  )  +  r43  +  +  r x2  +  s24  =  .826  sec. 

Class  5  transactions  do  not  have  to  observe  any  of  the  protocols, 
therefore 


RSDDr  =  write  delay 

=  max.  (r  3+s  ,  r  +r  +s  +s  ,  r  +s  +s  +s  , 


.143  sec. 
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7 . 5  Discussion  of  Numerical  Examples 

The  results  for  the  four  examples  described  in  sections  7.1  -  7.4  are 
summarized  as  follows: 


Ss\>Response  times  of 
transaction  class 

Concurrency 

Control  Algorithm^. 

— 

1 

2 

— 

3 

4 

5 

* 

All 

Classes 

1.  Centralized 

Two-phase 

Locking 

.507 

.220 

.256 

.  775 

.487 

.441 

2.  Distributed 

Locking  Ordered 
Queues  for  Deadlock 
Prevention 

4.02 

.509 

.692 

3.28 

4.41 

2.874 

3 .  Distributed  Locking 
Prioritized  Tran¬ 
sactions  for  Dead¬ 
lock  Prevention 

.918 

.171 

.429 

.530 

.321 

.522 

4.  SDD-1 

.724 

.490 

.622 

.826 

.143 

.543 

Although  we  have  used  an  arbitrary  example  to  compare  the  different 
algorithms  and  any  conclusions  drawn  based  on  these  results  may  not  apply 
in  general,  it  does  seem  obvious  that  Algorithm  2  gives  the  worst  response 
times.  This  is  mainly  because  of  the  requirement  that  files  have  to  be 
locked  in  a  specific  order.  This  requirement  does  not  allow  much  concurrency 
The  numerical  results  do  not  let  us  distinguish  the  performance  of 
Algorithm  1,  3  and  4.  Which  algorithm  is  better  depends  on  the  network 
topology  and  such  database  parameters  as  arrival  rates  of  transactions,  size 


*The  response  time  for  all  classes  is  a  weighted  average  (by  transaction 
class  arrival  rate)  of  the  response  time  for  each  individual  class. 


of  write-sets,  readsets,  etc.  For  example,  if  the  transaction  arrival  rates 
increase,  Algorithm  1  will  give  longer  response  times  since  the  central 
node  becomes  a  bottleneck.  This  does  not  happen  in  this  example. 

In  general,  we  believe  that  the  more  concurrency  an  algorithm  allows, 
the  smaller  its  average  response  time.  Thus,  we  would  expect  SDD-1  and 
Distributed  locking  to  give  better  response  times  than  Centralized  Locking. 
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CHAPTER  8 
CONCLUSIONS 


8. 1  Conclusions 

In  this  thesis,  we  have  developed  a  per formance  model  of  a  distributed 
database  system,  which  can  be  used  as  a  tool  to  compare  the  performance 
of  different  concurrency  control  algorithms. 

We  started  bv  developing  a  network  of  queues  model  of  the  communication 
subnetwork.  We  have  originally  attempted  to  employ  Jackson's  Model  but 
have  concluded  that  Jackson's  Model  is  inadequate  for  our  purposes.  The 
Independent  Queues  Model  that  we  employed  in  this  thesis  makes  somewhat 
stronaer  assumptions  than  Jackson's  Model,  but  has  more  flexibility  and 
approximates  better  a  real  communication  subnetwork.  Modelling  the 
commui:  icat ion  subnetwork  accurately  is  important  because  one  of  the  ma  jor 
costs  of  operating  a  DDE  is  the  communication  delay. 

We  found  that  in  a  general  DDB,  concurrency  control  algorithms  could 
not  be  modelled  accurately  without  taking  into  consideration  the  particular 
query  processing  strategy  employed.  Previous  authors  have  qotten  around 
the  problem  by  assuming  a  fully  redundant  database.  We  found  this  assump¬ 
tion  unacceptable  and  therefore  attempted  to  develop  a  new  query  processing 
str  itegy  that  is  easy  to  analyze.  Our  efforts  resulted  in  the  MST  and 
the  MPT  Algorithms  which  are  not  only  easy  to  analyze  but  also  easy  to 
implement . 

Having  modelled  the  conflicts  among  different  transactions  in  the 
PUP  for  the  resources  of  the  communication  subnetwork  by  the  Independent 
Queues  Model,  we  then  developed  conflict  models  to  analyze  the  conflict 
among  transactions  : or  the  resources  of  the  database  management  system. 


A  different  conflict  model  must  be  developed  for  each  concurrency  control 
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iluvr i thm.  fortunately,  although  the  literature  is  full  of  concurrency 
control  methods,  most  are  variations  fo  two  major  approaches,  namely 
two-phase  locking  and  timestamp  ordering.  Four  different  conflict 
models  were  developed:  Centralized  Two-Phase  Locking  with  Deadlock 
Detection,  Distributed  Two- Phase  Locking  with  Ordered  pueues  for  Deadlock 
Prevention,  Distributed  Two-Phase  Locking  with  Prioritized  Transactions 
for  Deadlock  Prevention,  and  SDD-1. 

Four  numerical  examples  using  a  common  communication  subnetwork 
were  used  to  demonstrate  how  our  performance  model  could  be  used  to 
analyze  these  four  concurrency  control  algorithms. 

One  would  hope  that  at  the  end  of  a  study  such  as  this,  one  can 
draw  some  conclusions  as  to  which  concurrency  control  algorithm  is  the 
best.  ''nfortunately ,  the  most  general  conclusion  we  can  draw  is  that 
which  algorithm  is  better  depends  very  much  on  the  particular  communication 
subnetwork  and  the  DDB  system. 

8.2  Further  Research 

In  this  thesis  we  have  touched  upon  many  different  aspects  of  a  DDB 
Management  system.  Due  to  time  constraints,  we  have  not  been  able  to 
study  all  the  different  problems  in  as  much  depth  as  we  would  like  to. 

There  are  a  number  of  open  problems,  listed  below  are  some  suggestions 
for  further  research: 

(1)  Our  study  realizes  that  communication  links  and  computers  are  not 
perfectly  reliable  and  incorporates  some  of  the  features  database 
systems  used  to  guard  against  such  failures,  such  as  two-phase 
commit.  However,  we  did  not  analyze  the  impact  of  such  failures 
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in  terms  of  extra  delay. 

(2)  Our  conflict  models  assume  exponential  end-to-end  transmission  delays. 
New  conflict  models  using  other  distribution  of  end-to-end  delays 

can  be  developed  in  a  similar  fashion. 

(3)  We  believe  that  the  MST  and  MDT  Algorithms  for  query  processing  are 
easy  to  analyze  and  to  implement.  However,  they  suffer  from  very 
strict  assumptions.  Maybe  the  two  algorithms  can  be  extended  by 
relaxing  some  of  these  assumptions. 

(4)  The  MST  and  MDT  Algorithms,  because  of  their  unrealistic  assumptions, 
are  actually  heuristics,  as  is  Wong's  Algorithm  [WONG77] .  It  should 
be  interesting  to  compare  them. 

(5)  One  important  part  of  model  development  that  we  have  not  studied  in 
this  thesis  is  that  of  model  validation.  Unfortunately,  since  there 
are  no  operating  commerical  systems,  the  only  way  to  validate  our 
performance  model  is  by  simulation,  which  is  expensive. 


i 
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Appendix  I  Finding  the  pdf  of  x,  given  x  < 
random  variables. 


y  where  x,  y  are  exponential 


W  ■  V~Vo 

f  (y  )  =  A  e 
y  yo  *2 


x  >  o 

o  — 


y  >  o 


Event  A  :  x  < 


:  x  <  y 


fx,y|A(Vyo!A)  = 

'  0 


-Ax  -Av 
.  \  \  i  o  2*o 
\XlX2e  e  /  P(A) 


if  (xo.yo)  in  A 
if  <xo,yo>  not  in  A 


Therefore,  f  .  I  A)  -  r  ,  I  . 

x|a(Xo'A)  -  ■*  x  fx,y )  A  (Xo'  ^o  A 


■/ 


3  -Ax  -Ay 
\  \  -  1  o  2*0 

AlA2e  e 


*o  Al/(Al  +  V 


dy._ 


-  (A  +  A  )  x 
=  (A1  +  A2)  e  1  2  ° 

which  is  the  same  as  the  pdf  of  min. (x,y) . 


x  >  0 
o  — 
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Appendix  II  Findinq  the  maximum  of  queueing  times  at  several  M/M/1  aueue? 

Let  w^  be  the  waiting  time  at  queue  i  with  arrival  rates  X.,  service 
rate  p. ,  utilization  p.  =  X./  g.,  and  x  =  max  w.,  then 

1  111  jg 

i 

N 

F  lx)  =  P  ( x  <  x  )  =  II  P(w  <  x  ) 
xo  —  o  ,  1—0 

-u . ( l-p . ) X 

=  R  (  1  -  p.  e  1  1  °  ) 

i 

1 


a  -p . ( i-p . )  x 

VV  =  Tx  FxUo>  =  ^  Vi(1  -  Pi>  e  1  1  °n  I  1  '  p  e 

°  ‘  i/i 


.  ( 1-f  )x 

1  1 


Fix)  =  j  ( 1  -  F  (x  ) )  dx 
U,  x  o  o 

f- or  the  special  case  that  x0  =  max{w^,w0),  we  have 

t:(x2)  -  /  l1  -  (1  -  P,  e  HI  -  P2  e  2  °)]  dxQ 


■u,(l-P,)x 

J  2  O 


+  e 


-p  (l-p  ) x 
1  1  o 


"  P1P2  e 


- (U2 ( 1 -p 0 ) +Uj  ( l-p  )  )  x. 


J  lx 


'  1 


P  2^2 


:  u  1  ( 1-n 2 )  u?(l-P2)  +  g  (l-p  ) 

Ln  genet  il,  it  x^_  =  max (w^ ,w^ , . . . .w^ ) ,  we  see  that 


-  I  l 


i  1 


^  •>  L*  1  1  —  0  .  )  "7  r* 

1  ■  1  1  1  1=1  l/l  ).  Ujd-0,1 

l  =  i ,  i 


y  y  y  — 

i.=i  i  /  i  j  / 1  y 


pi°jpP 


m=  i  ,  !  ,  ?■ 


U  ( l-P  ) 
m  m 


fiIP.> - (  k 

yk  ni-p.) 

i=i  1 


i 
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Appendix  III  SDDl :  General  Case 


In  this  Appendix,  we  derive  the  probability  of  read  rejection  and  expected 
wait  until  read  condition  is  satisfied  when  there  are  more  than  two  conflict¬ 
ing  transaction  classes. 

Let  T.,  T.,  ...  T.  denote  transactions, 

1  j  l 

1,  J,  ...  L  denote  nodes, 

N(i),  N ( j )  ...N(£)  denote  the  originating  node  of  transactions 
i,  j,  • .  •  £  . 


a . 
1 


interarrival  time  of  T\  at  N(i) 


t  =  transmission  delay  between  nodes  I  and  J 

1  tJ 

PL.  =  P (T .  rejected  by  T.  at  node  L) 
il  i  1 

WL.  =  (period  of  time  T.  has  to  wait  at  node  L  for  its  read  condition 
il  i 

to  be  satisfied  when  it  is  running  protocol  P3  against  T\ 

T\  is  not  read  rejected) 

Suppose  T.  is  running  protocol  P3  against  T.,  jeG(J)  at  node  J,  T  ,  keG(K) 
i  Ik 

at  node  K,  ...T  ,  mfG(M)  at  node  M.  G(J)  denotes  the  set  of  transactions 
m 

that  T\  conflicts  with  at  node  J. 

P  (TL  is  rejected)  =  1  -  P(T\  not  rejected) 


=  1  -  n  P(T,  not  rejected  at  node  a) 
a=l, J, . ..L 


=  i  -  n 


na-  p  ) 

«=I, J, . . .L  BcG(a) . 

(Query  Processing  Delay  |  T\  not  rejected) 


(III.l) 


=  (Wait  for  all  read  conditions  to  be  satisfies  T.  not  rej.) 

l 


=  max  (t  . .  +  max  (W  )  +  t  .  ) 

TT  _  N  ( l )  a  .  w  .  lB  aN(i) 

T  ,  J, .  .  . L  Bf:G  (a) 


(III. 2) 


where  P^  and  E(W^H  are  given  by  Equ.(6.28)  and  (6.29)  respectively. 
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the 

Appendix  IV  r inding^expected  value  of  the  maximum  of  several  exponentials . 


Let  yi's  be  N  exponentials  with  means  1/A. .  We  first  find  the  pdf 

of  =  max  (y . ) 

i=l,k  1  , 

F  (x)  k  k  ~^-x 

\  =  P(xk  <  x)  =  iU1P(yi  <  X)  =  in  (1  -  e  1  ) 

k  -X .  x 


F  (x)  = 

J,  V  '  u  -  . 

i=l  3^i 

\ 

=/o  (1  - 

_  r  K 

F  (x))dx  =f“  1  -  F 

x  Jo  [  1= 

k 

k  1 

-  I  _L 

I  I  1  +  ' 

i=l  X  . 

i=l  j^i  X.  +X.  i- 

l 

l  D 

(  +  )  x— 

1 

•X  .x 
1 


-X  .  x 


dx 


III  _ L 


i  j  i 


Z  Vj 
i=l  ' 


This  is  similar  to  the  derivation  in  Appendix  II  and  will  not  be 


repeated  here. 
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